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Abstract: We develop asymptotic theory for weighted hkehhood esti- 
mators (WLE) under two-phase stratified sampUng without replacement. 
We also consider several variants of WLE's involving estimated weights 
and calibration. A set of empirical process tools are developed including 
a Glivenko-Cantelli theorem, a theorem for rates of convergence of Z- 
estimators, and a Donsker theorem for the inverse probability weighted em- 
pirical processes under two-phase sampling and sampling without replace- 
ment at the second phase. Using these general results, we derive asymptotic 
distributions of the WLE of a finite dimensional parameter in a general 
semiparametric model where an estimator of a nuisance parameter is es- 
timable either at regular or non-regular rates. We illustrate these results 
and methods in the Cox model with right censoring and interval censor- 
ing. We compare the methods via their asymptotic variances under both 
sampling without replacement and the more usual (and easier to analyze) 
assumption of Bernoulli sampling at the second phase. 
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1. Introduction 



Two-phase sampling is a sampling technique that aims at cost reduction and 
improved efficiency of estimation. At phase I, a large sample is drawn from a 
population, and information on variables that are easier to measure is collected. 
These phase I variables may play an important role in statistical analysis such 
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as exposure in a regression model, or they may simply be auxiliary variables 
that are correlated with unavailable variables at phase I but are not of interest 
in themselves. Based on the all variables observed at phase I, the sample space 
is stratified using only the phase I data. At phase II, a subsample is drawn with- 
out replacement from each stratum, and phase II variables that are costly or 
difficult to obtain arc measured. Strata formation is intended either to oversam- 
ple subjects with important phase I variables, or to effectively sample subjects 
with phase II variables correlated with phase I variables, or both. This way, 
two-phase sampling achieves effective access to important variables with less 
cost and, as a result, enhances efficiency of estimation. 

Two-phase sampling was originally introduced in survey sampling by [26] 
for estimation of the "finite population mean" of some variable. Since then, 
this sampling method together with the Horvitz-Thompson estimator [14] as a 
standard estimator has been widely adopted in survey sampling. Much later, 
two-phase designs were introduced to biostatistical applications where, in con- 
trast to survey sampling, the population is infinite and the parameter in the 
statistical model is of interest rather than the average of some variables in a 
finite population. Notable examples of two-phase designs include those given in: 
[28, 38] (who considered fitting Cox models for the case-cohort design stratifying 
a cohort by the censoring indicator); [44] (who considered additional stratifica- 
tion on rare exposure in the setting of the case control design); and [2] (who 
extended the case cohort study of [28] by additionally stratifying on covariates). 
More recently the broad applicability and importance of two phase designs has 
been emphasized by [3, 4]. Because of these features, two-phase sampling has 
recently received more attention from practitioners, including biostatisticians, 
as an attractive study design. 

In this paper, we consider weighted likelihood estimation under two-phase 
sampling. The main difficulty in this problem is dependence among observations 
induced by the sampling scheme. The current practice for analysis of this type of 
data in biostatistics has been to assume independence (i.e. Bernoulli sampling) 
for theoretical simplicity. Specifically, statistical analysis is often carried out as 
if observations are obtained from stratified Bernoulli sampling rather than from 
stratified sampling without replacement at phase II. Because the asymptotic 
variances under our "without replacement" sampling scheme are shown here to 
be smaller than under stratified Bernoulli sampling (see also [6]), use of variance 
estimates based on Bernoulli sampling calculations results in over-estimates of 
variances and hence conservative conclusions. Despite theoretical difficulties, [6] 
developed a method to derive asymptotic distributions of the weighted likelihood 
estimator (WLE) under stratified sampling without replacement. Our goal is to 
complement and extend their results in several directions. 

In addition to obtaining a precise asymptotic variance, there are several ad- 
vantages to use of the WLE and improvements based on "estimated weights" or 
calibration. It may be true that one is willing to assume independence because 
the WLE is generally inefficient under the independence assumption. However, 
even when this assumption holds, there are not many statistical models where 
efficient estimators are known (see [30] , [29] , and [5] for some exceptions) . More- 
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over, efficient estimators, if known, may require sopliisticated numerical tech- 
niques (e.g., solving integral equations, [25]) or restrictive assumptions that 
are not imposed when complete data is available (e.g. a parametric covariate 
distribution [9], discrete covariates [25]). In contrast, the maximum likelihood 
estimator with complete data is often available in many applications, and the 
corresponding WLE is obtained simply via a weighted likelihood version of the 
same likelihood equations. Furthermore, theory for the WLE only requires al- 
most identical conditions to those for the MLE with complete data (see The- 
orems 3.1 and 3.2 below). Another related advantage of the WLE and their 
variants involving estimated weights or calibration is robustness to model mis- 
specification: When the underlying model is misspecified, the WLE's and their 
relatives continue to estimate the same parameters as would be estimated un- 
der model miss-specification with complete data. (For example, see [17], [13], 
[36, 37], [45], and [18]. See also [35] for careful further considerations of this 
issue.) 

Survey sampling has a long history of dealing with dependent observations 
from stratified sampling without replacement and even more complicated de- 
signs. In fact, asymptotic results have been established for more complex survey 
designs (see [16] and references therein). However, even when a model is postu- 
lated, survey statisticians are usually interested in a "finite population param- 
eter" , which is usually defined as the solution of some estimating equation in a 
finite population [32], not a "super-population parameter" which determines a 
scientific phenomenon for an infinite population. Asymptotic theory in survey 
sampling thus usually treats a sequence of finite populations with increasing 
sample sizes based on conditions regarding designs conditional on observations 
consisting of a finite population [16]. Because biostatisticians are more inter- 
ested in super-population parameters, our asymptotic theory is relevant in the 
usual biostatistical settings. 

One notable exception in the survey sampling literature is [32] (see also refer- 
ences therein and [19], [10]). In [32], the authors define the product of the model 
space and the design space as a probability space, and decompose their normal- 
ized estimator into the contributions from a super-population and a sampling 
design. Two distinct sets of conditions for the model space and design space are 
used to guarantee the asymptotic normality of each contribution respectively. 
The former and the latter conditions are familiar to biostatisticians and survey 
statisticians, respectively, but not vice versa. See [31] for two sets of conditions 
in an application to the Cox model under cluster sampling. 

Our approach relies on the framework and some of the results developed 
in [6]. In [6], the inverse probability weighted empirical process is decomposed 
into the usual empirical process (phase I contribution) and the weighted sum 
of finite sampling empirical processes (phase II contribution). (Compare (10) 
of [6] with the decomposition (A. 8) of [32].) Then conditional on the phase I 
data, the results for exchangeably weighted bootstrap empirical processes [27], 
which covers our sampling scheme, is applied to show the weak convergence of 
the phase II contribution. Despite some similarity to the framework involving 
model and design spaces, our framework is different from [32] in the following 
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important points. First of all, we do not need additional conditions for the 
phase II contribution unlike design conditions imposed in [32] because the same 
conditions for the phase I contribution suffice for the exchangeably weighted 
bootstrap empirical process theory to apply in our setting. Second, our method 
is more general since our decomposition is at the process level, not the level 
of random variables. Third, our formulae for asymptotic variances have more 
natural interpretations than the formulae in the framework of [32] that consist 
of two incongruent parts, one that depends on the model conditions and the 
other that depends on the design conditions. For these reasons, our approach 
should be distinguished from those in survey sampling. 

The main results of our paper are two Z-theorems giving weak sufficient 
conditions for asymptotic distributions of the WLE's in general semiparamet- 
ric models. The first theorem covers the case where the nuisance parameter is 
estimable at a regular {y/n) rate, while the second theorem serves for the case 
where only estimators with slower than -^71— convergence rates are available. In 
addition to the plain WLE, we include the WLE's with estimated weights and 
(variants of) calibration in the formulations of both theorems. These estimators 
are obtained when adjusting weights in the WLE by estimated weights [30], cal- 
ibration [12], modified calibration [8] or our new method, centered calibration, 
respectively. The first two methods are used in practice to try to gain efficiency 
over the WLE (see [20] for a recent review and discussion), and the third method 
was recently proposed to improve on the original calibration. It is of some in- 
terest to understand which method improves efficiency in comparison to the 
plain WLE. The weighted likelihood estimator is already well-studied in cases 
with regular rates. [6] derived the asymptotic distribution of the WLE under 
stratified Bernoulli sampling and stratified sampling without replacement. [7] 
studied the WLE with estimated weights under stratified Bernoulli sampling, 
and showed efficiency gains over the plain WLE. [3, 4] obtained in a heuristic 
way the asymptotic distributions of the the WLE's with estimated weights and 
the calibrated WLE under stratified sampling without replacement. One of the 
difficulties in the derivations in [3, 4] involves the lack of a proof of asymp- 
totic equicontinuity of certain stochastic processes under dependence. A similar 
difficulty is also recognized by [19] in the context of complex surveys. Direct 
application of empirical process theory does not help due to lack of indepen- 
dence among observations. Another difficulty, which is also seen in other papers, 
concerns (lack of) proofs of consistency of estimators under dependence. When 
a nuisance parameter is not estimable at a regular rate, no general consistency, 
rate of convergence, or asymptotic normality results are known in the framework 
of two-phase designs to the best of our knowledge. 

The main contributions of our paper arc three-fold. First, we rigorously jus- 
tify the results of [3, 4] with weaker conditions. We further extend the result 
to the case where the nuisance parameter is not estimable at a regular rate. 
The conditions of our theorems are formulated in terms of complete data, not 
two-phase sampling data, and, moreover, they are almost identical to those for 
the MLE with complete data. Thus, most of them may be already established in 
many applications. For the conditions requiring verification, tools from empirical 
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process theory will be applied. Second, wc developed a new method which yields 
improved efficiency over the plain WLE under our sampling scheme. In fact, our 
method of centered calibration is the only guaranteed method, among all meth- 
ods considered in this paper, to gain efficiency under both stratified Bernoulli 
sampling and stratified sampling without replacement while other methods are 
warranted only for stratified Bernoulli sampling. Third, we establish general 
results for the inverse probability weighted (IPW) empirical process, which is 
defined in the next section. Some results such as Glivenko-Cantelli theorem 
(Theorem 5.1) and Donsker Theorem (Theorem 5.3) are of interest in their own 
right. These results, accounting for dependence of observations due to the sam- 
pling design, are used to prove our Z-theorems in place of the usual empirical 
process theory. More importantly, they are useful in verifying the conditions 
of Z-theorems in applications. For instance, our Theorem 5.2 easily establishes 
rates of convergence under our "without replacement" sampling scheme. Also, 
consistency can be verified with the aid of the Glivenko-Cantelli theorem. We 
illustrate application of the general results with examples in Section 4. 

The rest of the paper is organized as follows. In section 2, we introduce our 
sampling scheme and estimation procedures in a general semiparametric model. 
The WLE and methods involving adjusted weights intended to improve on the 
efficiency of the WLE are discussed. Two .Z-theorems are presented in section 
3 to derive asymptotic distributions of the WLE's of the finite dimensional pa- 
rameter. All estimators are compared under Bernoulli sampling and sampling 
without replacement with different methods of adjusting the weights. We ap- 
ply our Z-theorems to the Cox model, both with right censoring and interval 
censoring, in section 4. The WLE of the cumulative baseline hazard function 
has regular rate of convergence in the first example, while it has cube-root rate 
in the second example. Section 5 consists of general results for IPW empirical 
processes. Several open problems are briefiy discussed in Section 6. All proofs 
except those in section 4 and auxiliary results are collected in Section 7. 

2. Sampling, Models, and Estimators 

We now introduce our sampling scheme. Most of the following notation is based 
on [6]. Let W = (X, U) & W = X xUhe the complete data with distribution Pq 
where X is the vector of the variables of interest with distribution Pg and C/ is a 
vector of auxiliary variables. At phase I, only a coarsening X = X{X) of X and 
the auxiliary variables U are available for all N subjects. The phase I data V = 
(X, U) <^V ~ X xU are used to form the J sampling strata Vj with X]/=i ~ 
V, the jth of which consists of Nj subjects for j = 1, . . . , J. After stratified 
sampling, X is fully observed for rij subjects in the jth stratum at phase II. 
The observed data is {V,X£_,^) where ^ is the indicator of being sampled at 
phase II. We use a doubly subscripted notation by which Vj^i, for example, 
denotes V for the ith subject in stratum j. We denote the stratum probability 
for the jth stratum by = Po{V £ Vj), and the conditional expectation given 
membership in the jth stratum by Po\j{') = Po('|l^ G Vj). 
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At phase II, samples of size rij < Nj arc drawn at random without replace- 
ment from each of the J strata. The sampling probability is = l\Vi) = 
T^oiVi) = "^jl^j for Vi € Vj. These sampling probabilities are assumed to 
be strictly positive; that is, there is a strictly positive constant ct > such 
that < (T < ■7To{v) < 1 for v £ V. We assume that rij/Nj — > > for 
J = 1, . . . , J as N ^ oo. Although dependence is induced among the observa- 
tions {Vi,(,iXi,^i) by the sampling indicators, the vector of sampling indicators 
i^jii • ■ • 5 ^jNj ) within strata, j = 1, . . . , J, are exchangeable for j — 1, . . . , J, 
and the J random vectors (^ji, . . . , CjNj ) are independent. 

One of the most important tools in empirical process theory is the empirical 
measure. However, the empirical measure is not directly applicable to estimation 
under two-phase sampling because some observations are not observed at phase 
II. Instead, we define the inverse probability weighted (IPW) empirical measure 

by 

where Sxi denotes a Dirac measure placing unit mass on Xi. The identity in the 
last display is justified by the arguments in Appendix A of [6]. We also define 
the IPW empirical process by G]^ = \/lV(P]^ ^ ^o) and the phase II empirical 
process for the jth stratum by 

where, for j £ {1, . . . , J}, P| = Nj'^ X^i^i ^j^^Xj^i is the phase II empirical 

measure for the jth stratum, and Pj,Nj = -^j"^ St^i ^Xj i is the empirical mea- 
sure for all the data in the jth stratum; note that the latter empirical measure 
is not observed. Then, following [6], page 207, we decompose GJf as follows: 




where = iV"^ E/=i ^jIPj.a^, and Gn = //V(Pw - Po)- Notice that the 
phase II empirical processes G| jy. correspond to "exchangeably weighted boot- 
strap" versions of the stratum-wise complete data empirical processes Grj^Nj = 
■\/Nj{^j,Nj — where Po|j is the conditional distribution of X given mem- 
bership in the jth stratum and P^.at is as defined above. This observation allows 
application of the "exchangeably weighted bootstrap" theory of [27]. 

2.1. Improving efficiency 

Efficiency of estimators based on IPW empirical processes can be improved by 
adjusting weights, either by estimated weights [30] or by calibration [12] via use 
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of the phase I information; see also [20]. In addition to these two methods, we 
discuss two variants of calibration, modified calibration [8], and our proposed 
method, centered calibration. 

Let Zi = g{Vi) be the auxiliary variables for the ith subject for a known 
transformation g. For estimated weights through binary regression, the first J 
elements of Zi are the membership indicators for the strata, I\>. {Vi)^j = 1, . . . , J. 
Furthermore, observations with ■ko{V) = 1 are dropped from binary regression, 
and the original weight 1 is used. For notational simplicity, we write Zi for either 
method, and assume that sampling probabilities are strictly less than 1 for all 
strata. 



2.1.1. Estimated weights 

The method of estimated weights adjusts weights through binary regression on 
the phase I variables. The sampling probability for the ith subject is modelled by 
Po.mZ,) = Ge(Zfa)«-(l - Ge(Zfa))i-«- = 7r^{V,)^^ {1 - 7r„(y,)}'"^' , where 
a Ae C M'^^*^ is a regression parameter and Ge : K. i-> [0, 1] is a known 
function. If Ge{x) = e^/ (1+e^) for instance, then the adjustment simply involves 
logistic regression. Let ctN be the estimator of a that maximizes the composite 
likelihood 

N N 

nP"(ed^O = nGe(^f - Ge(Zfa))i-«'. (2.2) 

i=l i=l 

We define the IPW empirical measure with estimated weights by 



and the IPW empirical process with estimated weights by G]^*^ — ^/N {P'^'^ — Pq) . 



2.1.2. Calibration 

Calibration adjusts weights so that the inverse probability weighted average 
from the phase II sample is equated to the phase I average, whereby the phase 
I information is taken into account for estimation. Consider the problem of 
choosing the weights {wi}fc=i subject to the condition 

N N 

]^Ee^-'^' = ]^E^- (2-3) 

since the Zi's take values in M'^, this is a system of equations in BJ'. In general 
there are many solutions to this system of equations, and the inverse proba- 
bility weights l/TTo{Vi) will typically not satisfy it. Because weights differing 
greatly from the inverse probability weights are unlikely to improve on the plain 
weighted likelihood estimates, calibration involves choosing weights closest to 
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the inverse probability weights in a certain distance measure. Let Di{w,d) be 
a distance measure between the weights w and d for the zth subject, where for 
every fixed d > 0, Di[w,d) is nonnegative, continuously differentiable with re- 
spect to and strictly convex in w, and {d/dw)Di{w,d) is strictly increasing 
and is zero at w = d (see [12] for various choices of Di). The resulting prob- 
lem is a convex optimization problem: find positive weights Wi that minimize 
the average distance J2iLi Di{wi, l/7ro(l^i)) with the constraint (2.3). The 
method of Lagrange multipliers leads to {d/dw)Di{wi, l/7ro(T^i)) + Zfa = for 
the subjects with = 1 where a is a Lagrange multiplier. The invertibility 
of {d/dw)Di leads to the solution m; = Gi{Zj'a)/TTo{Vi) for some function d 
where Gi{0) = 1 and Gi{0) > 0. Substitution in (2.3) gives the calibration equa- 
tion N-^J2f^l(iiG^iZTc')/M^))Zi = N-'^J2f=lZ^. The solution a to the 
calibration equation determines the calibrated weights Wi = Gi{Z'[a)/Tro{Vi). 

One easy choice, as in [3, 4, 20], is to take the distance measures Di to be the 
same for all subjects; i.e., Di = D and Gi = Gc for i = 1, . . . ,N. An alternative 
subject specific choice of the ZJ^'s, leading to "modified calibration", will be 
discussed in the next subsection. In both cases we formulate the calibration 
in terms of the calibration equation rather than the problem of minimizing a 
distance with the inverse probability weights. These assumptions simplify the 
condition on G^'s and I?i's. In our formulation with equal D^'s, we find an 
estimator ajv that is the solution for a G Ac C M'"' of the following calibration 
equation, 

where Gc{V;a) = G{g{V)'^ a) = G{Z'^a), and G is a known function with 
G(0) = 1 and G(0) > 0. We caU iTaiVi) = ^To{V^)/GciVi;a) the calibrated 
sampling probability for the ith subject. We define the calibrated IPW empirical 
measure by 

and the calibrated IPW process by G^*^ = \/lV(P]^'^ — Pq). 
2.1.3. Modified calibration 

[12] discussed subject-dependent distance measures Di when {d/dw)Di = D{w / d) / qi 
where D{x) is a continuous, strictly increasing function on R with D{1) = 
and {d/dx)D{l) = 1, independent of the index i, and qi > 0. Solving the convex 
optimization problem in this case with some choice of g^'s leads to the calibra- 
tion equation N^^ J2^=i ^i{G{qiZf a) /■na{Vi))Z-i = N^^ Y^^=i ^« fo^' ^l^^ inverse 
G = {D)~^ of D. Recently the choice qi ~ [1 — 7ro(V^;))/7ro(l^) was proposed by 
[8] in a missing response problem. When 7ro(K:) < 1, i = 1, ■ • ■ this choice 
means that when the sampling probability is larger, the subject contributes 
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more to the average distance. Note that qi ~ Q when ^^{Vi) = 1. Ahhough qi 
must be strictly positive in the original formulation of the problem to minimize 
the distance with the inverse probability weights, qi = Q is valid in the cali- 
bration equation. One implication of this choice is that we do not modify the 
weights if subjects are always sampled at phase II. We call the method of choos- 
ing weights by solving the calibration equation with qi = (1 — 7ro(yi))/7ro(T^) 
modified calibration. 

In modified calibration, we find the estimator ajv that is the solution for 
a e Amc C M*"' of the following calibration equation: 

i—1 ^ ' 2—1 

where 

G^.{V,a) ^ G ^-—^giV) aj=G (^^^^ «j , 

and G is a known function with G(0) = 1 and G(0) > 0. We call TTa{Vi) = 
'^o{yi)/Gmc{Vi; a) the calibrated sampling probability with modified calibra- 
tion for the ith subject. We define the IPW empirical measure with modified 
calibration by 

and the IPW process with modified calibration by G^™^ = ^/N{F'^"^'' ~ Pq). 
2.1.4- Centered calibration 

We propose a new method, centered calibration, that calibrates on centered 
auxiliary variables with modified calibration. This method in fact improves the 
plain WLE under our sampling scheme, while retaining the good properties of 
modified calibration. We discuss advantages of centered calibration and connec- 
tions to other methods in Section 3.4. 

In centered calibration, we find the estimator ajv that is the solution for 
a £ Acc C K'^ of the following calibration equation: 

where 

GUV; a)^G (^^^{^ - ^^}^«) , 

with Ztv = N^^ EiLi suppressed in the definition, and G is a known function 
with G(0) = 1 and G(0) > 0. We call TTa{Vi) = TTQ{Vi)IGcciy.,;a) the calibrated 
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sampling probability with centered calibration for the ith subject. We define the 
IPW empirical measure with centered calibration by 

1 " ^ 1 ^ f 

l—l l—l 

and the IPW process with centered calibration by G^j^"' = \/N {f'^j^'''^ — Pq). 
2. 2. Estimators 

We study the asymptotic distribution of the weighted likelihood estimator of a 
finite dimensional parameter in a general semiparametric model V = {Pe,rj '■ 
9 G Q,r] G H} where O C and the nuisance parameter space iJ is a subset 
of some Banach space B. Let Pq = Peo,rjo denote the true distribution. 

The maximum likelihood estimator with complete data is often obtained as 
a solution of the infinite dimensional likelihood equations. In such models, the 
WLE under two-phase sampling is obtained by solving the corresponding infinite 
dimensional inverse probability weighted likelihood equations. Specifically, the 
WLE {6N,flN) is a solution of the following weighted likelihood equations 

\\'^N,2i^,v)h\\^ - \\fNiBe.^h-Pg,^Be.^h)\\^ = op, (iV-^/^) , (2.7) 

where £g^,j S ^^(^f,';)^ is the score function for 9, and the score operator Bg^j^ : 
T-L I— >■ L^{Pg^ri) is the bounded linear operator mapping a direction h in some 
Hilbert space T-L of one-dimensional submodels for rj along which r] ^ rjQ. The 
corresponding WLE with estimated weights {9N^e,flN,e), the calibrated WLE 
{9N,c,fjN,c), the WLE with modified calibration {9N,mc,'>lN,mc), and the WLE 
with centered calibration {9^^cc,fjN,cc) are obtained by replacing by P^", 
IP^^ ^N™'" or F^'j^"" in (2.7), respectively. Let Iq = £g„^,^g and Bq = Bg^^no- 

3. Asymptotics for the WLE in general semiparametric models 

We consider two cases: in the first case the nuisance parameter 77 is estimable 
at a regular (i.e., ^/n) rate and, for ease of exposition, 77 is assumed to be a 
measure. In the second case 77 is only estimable at a non-regular (slower than 
y/n) rate. Our theorem (Theorem 3.2) concerning the second case nearly covers 
the former case, but requires slightly more smoothness and a separate proof of 
the rate of convergence for an estimator of 77. On the other hand, our theorem 
(Theorem 3.1) concerning the former case includes a proof of the (regular) {y/n) 
rate of convergence, and hence is of interest by itself. 
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3.1. Conditions for estimating weights and (modified and centered) 
calibration 

To derive asymptotic distributions of WLE's with estimated weights and (mod- 
ified and centered) cahbration, we need to estabhsh asymptotic results on esti- 
mators of a with estimating weights and (modified and centered) cahbration. 
To this end, we assume the foUowing. Throughout this paper, we may assume 
both Conditions 3.1 and 3.2 at the same time, but it should be understood 
that the former condition is used exclusively for the estimators regarding esti- 
mated weights and the latter condition is imposed only for estimators regarding 
(modified and centered) calibration. Moreover, it should be understood that 
Conditions 3.2(a)(i) and 3.2(d)(i) are assumed for the estimators regarding cal- 
ibration, Conditions 3.2(a)(ii) and 3.2(d)(ii) are imposed for the estimators re- 
garding modified calibration and that Conditions 3.2(a) (iii) and 3.2(d)(iu) are 
used for the estimators regarding centered calibration, respectively. 

Condition 3.1 (Estimated weights), (a) The estimator olm is a maximizer of 
the composite likelihood (2.2). 

(b) Z E R'^+'^ is not concentrated on a [J + k)- dimensional affine space o/K.'^+'' 
and has bounded support. 

(c) Ge : K I— >■ [0, 1] is a twice continuously dijferentiable, monotone function. 

(d) So = Po ({Ge(Z^ao)}H7ro(V^)(l - tto{V))}-^ Z^^'j is finite and nonsmgu- 
lar, where Ge is a derivative of Ge ■ 

(e) The "true" parameter uq = (ao.i, ... ,ao,, j+k) is given by aoj- = G~^{pj), 
for j = 1, . . . , J, and ao.j ~ 0, for j = J + 1, . . . , J + k. The parameter a is 
identifiable, that is, pa ~ Pao almost surely implies a ^ uq. 

(f) For a fixed pj e (0, 1), nj satisfies nj ~ [NjPj] for j = 1, . . . , J. 

Condition 3.2 ((modified and centered) Calibration), (a) (i) The estimator 
OLN = ofj^ is a solution of the calibration equation (2.4). (H) The estimator 
OLM ~ djv'^ ^ solution of the calibration equation (2.5). (iii) The estimator 
oiN = cfj^ is a solution of the calibration equation (2.6). 

(b ) The distribution of Z E R'^ is not concentrated at and has bounded support. 

(c) G is a strictly increasing continuously differentiable function on R such that 
G{0) = 1 and for all X, -oo < mi < G{x) < Mi < oo andO < G{x) < Ah < oo, 
where G is the derivative of G. 

(d) (i) PqZ®^ is finite and positive definite. (ii)Po[tto{V)-^ [1 - tto[V))Z'^-''^] is 
finite and positive definite. ('m^Po[7ro(T^)~^(l — 7ro(y))(Z — /i^)®^] is finite and 
positive definite where pLz = PZ . 

(e) The "true" parameter = 0. 

Condition 3.1 (f) may seem unnatural at first, but in practice the phase II 
sample size Uj can be chosen by the investigator so that the sampling probability 
Pj can be understood to be automatically chosen to satisfy Uj = [NjPj]. The 
other parts of Condition 3.1 are standard in binary regression, and Condition 3.2 
is similar to Condition 3.1. 
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Asymptotic properties of a at for all cases (estimated weights and (modified 
and centered) calibration) are proved in [34]. 



3.2. Regular rate for a nuisance parameter 
We assume the following conditions. 

Condition 3.3 (Consistency). The estimator (OntVn) is consistent for {9o,r]a) 
and solves the weighted likelihood equations (2.7), where ¥Jf may be replaced by 
P^*^, P^*^, P^'"^ or P^'^'^ for the estimators with estimated weights, calibration, 
modified calibration or centered calibration. 

Condition 3.4 (Asymptotic equicontinuity) . Let J'^i{5) = {£e,rj ■ \0 — (^o\ + 
||77 - voW < S} and ^2(6) = - Pe,,Be,,h : h e n,\0 - OqI + \\v - Vo\\ < 

6}. There exists a do > such that (1) J-'k{5o),k = 1,2, are Po-Donsker and 
sup/ig-H Polfj ~ foj'^ 0, as \9 - 6*01 + - ?/oll ^ 0, for every fj £ J^j{So),j = 
1,2, where fo,i = leo.vo and fo,2 = Boh - PoB^h, (2) Fk{5o),k = 1,2, have 
integrable envelopes. 

Condition 3.5. The map * = (^Pi, vj/j) : 8xi/ MPx£°°('H) with components 

^2(0, v)h = Po'^nA^.v) = PoBe,^h-Pe,r,Be^r,h, h£H, 

has a continuously invertible Fr'echet derivative map '^0 = (^"11, ^12, ^21, ^22) 
at {9o,r]o) given by i!ij{0o,r]o)h = Po(V'i,i,eo,r,o,h), i,] e {1,2} in terms of L2{Po) 
derivatives of ipi, 

sup jPo (^Vi.e.tjo.'i - V'i,eo,r;o,'' - '/'ii,eo,'jo,'i(^ ^ ^o)) | = o(ll^-^o|l), 
i^vJyPo{^i,eo,v^-'^%eom^-'^a,eo,vaA'n | = o{\\-q-r^o\\)- 



Furthermore, admits a partition 

{0-eo,v-v)^ 



*11 *12 w O-do 

*2i "^22 J \ v - m 



wher 



*ll(6'-6'o) = ~Pea,Vo^Oo.vo^Io,jjoi^ ^ ^o), 

*12(?7 -Vo)^- J Bl^^,,Jeo,,jod{T] - 770), 
*2i(6' - 9o)h = -Pea,voBeo.vo^^eo,voi^ ~ ^ 



*22(?7 - Vo)h = - j Bl^ ,^^Bgg^,^„hd{T] - 770), 
and Bg^ no-^^o,no i^ continuously invertible. 
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Let io = Pq[{I - i?o(Boi?o)"^-So)^£o ] be the efficient information for and 
= Io^{I — Bo{BQBo)^^BQ)io be the efficient influence function for 9 for the 
semiparamctric model with complete data. 

Theorem 3.1. Under the Conditions 3.1-3.5, 



N{9n-9o) = VNFJf£o + op,{l) ^ Z ^ iVp(0,S), 

Ni9N,e-0o) = \/iVP^% + Op.(l) Z, ^ ^p(0,Se), 

Ni§N,c-do) = ^NF^'/io+Op^il) ^ Z, ^ ^p(0,Se), 

Ni9M,rnc-0o) = \/]VP]^™% + OP- ( 1 ) ^ ^,nc ~ A^p(0,S]„,), 

N{9N,cc-0a) = + op. (1) ^ Zee ~ A^p(0,Sec), 



where 



E = /q- 1 + ^ z., ^ Faroi, (/o ) , (3.8) 

,=1 Po 



Ee = + J2 -Varo\,iiI - Qe%), (3.9) 

Pj 



= /q-i + ^ z/, ^ t^aro|j ((/ - Oe)4), (3.10) 

0=1 

Smc = V' + ^ ^Varoyiil - Q™e)4), (3.11) 



y = j-^ 



J2 —Varo\,{{I - Qcc)io), (3.12) 



and (recall Conditions 3.1 and 3.2) 

QJ = Po[7r^\V)fGe{Z^ao)Z^]So\l-MV)r'G,{Z^ao)Z, 

QJ EE Po[fZ^]{PoZ^'}-'Z, 

Qracf ^ P,[{^(i\V)-l)fZ'^]{Po[{T:^\V)-l)Z'^^]}-'Z, 

Q,J EE Po[{n,\V) - l)fiZ^^izf]{Po[{n,\V) 1){Z - fizn-\Z /i^)- 

Remark 3.1. Our conditions in Theorem 3.1 are the same as those in [7] 
except the integrability condition. Our Condition 3.4 (2) requires existence of 
integrable envelopes for class of scores while the condition (^1*) in [7] requires 
square integrable envelopes. Note that this integrability condition is required only 
for the WLE with estimated weights and (modified and centered) calibration, as 
in [6J. 

3.3. Non-regular rate for a nuisance parameter 
Set 



imsart-generic ver. 2011/05/20 file: wlecoxv-arxiv3.tex date: January 19, 2012 



Saegusa and Wellner/ Weighted Likelihood: two-phase sampling 



14 



for ft, = (/ii, . . . , hp)'^ where hk G H for each k = 1, . . . ,p. We assume the 
following conditfons. 

Condition 3.6 (Consistency and rate of convergence) . An estimator {Opf,!]^) 
of {Oo,r]o) satisfies \9n — OqI = op(l), and \\rjN — rio\\ = Op{N^^) for some 
/3 > 0. 



Condition 3.7 (Positive information). There is an h* 
h\ G H for k = 1, . . . ,p, such that 



{hi, . . . , hp), where 



for all h G H. Furthermore, the efficient information Iq = Pq y!o — BQ[h*]^ 
for 9 for the semiparametric model with complete data is finite and nonsingu- 
lar. Denote the efficient influence function for the semiparametric model with 
complete data by £o = lQ^{io — Bo[h*]). 

Condition 3.8 (Asymptotic equicontinuity) . (1) For any 6^ ^0 and C > 0, 



sup 

\e-ea\<SNMn-no\\<CN-i^ 
sup 

\e-eo\<5r, Mn-na\\<CN-i^ 



iN{te.-n~ £q) = op(l), 
lN{Be,^~Bo)[h:]\^op{\). 



(2) There exists a 5 > Q such that the classes |^6(,r; : |^ — ^o| + ||^ ^ ^o|| l£ 
and {Bg ,j [h*] : \d — 9o\ + \\r] — rjaW < 6} are Po-Glivenko-Cantelli and have in- 
tegrable envelopes. Moreover, £e,r; md a^'e continuous with respect to 

{9, rj) either pointwise or in Li{Pq). 

Condition 3.9 (Smoothness of the model). For some a > 1 satisfying a/3 > 
1/2 and for {9,rf) in the neighborhood {{9,ri) : \9-9o\ < 5n, \\v-m\\ < CN-f'}, 

Po {ie.r, - 4 + iaiioiO - 9o) + Bo[r/ - %])} 

^oi\9-9o\) + Oi\\rj-Vo\n, 

Po {{Be., - B„)[h*] + Bo[h*]{iU0 - Oo) + B^iv - %])} 

^o{\9-9o\) + 0{h-rjo\n. 

In the previous section, we required that the WLE solves the weighted likeli- 
hood equations (2.7) for all h G H. Here, we only assume that the WLE {9^, fjN) 
satisfies the weighted likelihood equations 



(3.13) 



The corresponding WLE with estimated weights, {9N.e,'>)N.e)i the calibrated 
WLE {9n,c,Vn,c), the WLE {9N.7nc,flN.mc) with modified calibration and the 
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WLE {On,cc, Vn.cc) with centered cahbration satisfy (3.13) with FJ^ replaced by 
P^^ F^'j^", FJ^""" or P^''^ respectively. 

Theorem 3.2. Suppose that the WLE is a solution of (3.13) where P]^ may he 
replaced by P^*^ Pjv*^; P^™*^ or F^'^'^ for the estimators with estimated weights, 
calibration, modified calibration and centered calibration. Under the Conditions 
3.1, 3.2 and 3.6-3.9, 







"jfio + 


op.(l) 




Z 


- A^p(0,E), 


^{dN.e - Oo) = 






-op, {I) 




Ze '■ 


- A^p(0,Se), 


Wn,c - do) = 




''Vh 4 


-op.(l) 




Zc ^ 




N{0N,mc — Svi) — 


\/iVI 


N '^0 


+ 0P.(1) 




7 . 


- A^p(0,S„,) 


T^{0N^cc -Oo) = 


^/Nl 




f Op.(l) 




Zee ^ 


- A^p(0,Sec), 



where S, !]„, Sc, ^mc and Sec are as defined in (3.8) - (3.12) of Theorem 3.1, 
hut now lo and £o are defined in Condition 3.7, and Qe, Qc, Qmc and Qcc are 
defined in Theorem 3.1. 

Remark 3.2. Our conditions are identical to those of the Z -theorem of [15] 
except the Condition 3.8 (2). This additional condition is not stringent. First, 
the Glivenko-Cantelli condition is usually assumed to prove consistency of es- 
timators before deriving asymptotic distributions. Second, a stronger L2{Po)- 
continuity condition is standard as is seen in Condition 3.4 (See also 25.8 of [41] 
for a nice discussion of regularity conditions for efficient score equations with 
complete data). Note that the Li{Po) -continuity condition is only required for the 
WLE's with estimated weights and (modified and centered) calibration. Another 
way to understand the relative weakness of the Condition 3.8 (2) is to compare 
it with standard conditions for bootstrapping Z-estimators because the IPW em- 
pirical process is closely related to the exchangeably weighted bootstrap empirical 
process. See, for example, conditions A. 4 and A. 5 in [43]. The differentiability 
condition A. 4, which implies continuity, corresponds to our Li{Po)- continuity 
condition. However, we do not impose a condition similar to the weak L2{Po) 
condition A. 5. In fact, our Lemma 5.4 in Section 5 with the Clivenko-Cantelli 
condition can be used to relax condition A. 5 of [43]. 



3.4- Comparisons of methods 

We compare asymptotic variances of five WLE's in view of improvement by 
adjusting weights and change of design. To make these comparisons clearly, we 
first need to give a clear statement of the result corresponding to Theorem 3.1 
for stratified Bernoulli sampling. 



3.4.1. Stratified Bernoulli sampling 

We present asymptotic normality of the WLE's, ^^f™, ^ff™, 

^N^cc under stratified Bernoulli sampling where all subjects are independent 
with the sampling probability pj if e Vj . 
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Theorem 3.3. Suppose the Conditions 3.1 (except 3.1(f)) and 3.2 hold. Let 
^, G {0, 1} and C he i.i.d. with E[£,\V] = 7ro(V^) = 'Ej=iPjHV € V^). 
(1) Suppose that the WLE is a solution of (3.13) where ¥Jf may be replaced by 
P^*^ P^'^, P]^'"'^ orP^"^"^ for the estimators with estimated weights, calibration, 
modified calibration and centered calibration. Under the same conditions as in 
Theorem 3.1, 





Oo) = 






(0 + 


Op.(l) 




2^Bern ^ 




^{offi:^ - 


Oo) = 




^AT 




-Op.(l) 




^Bern ^ 


iVp(0,Ef'=™), 




Oo) = 


Vn 


^AT 




-Op.(l) 




^Bern ^ 


iVp(0,S]f'=™), 


.pM I f)Bern 




//V 






+ op.^1 


1 -W 


yBern ^ 
mc 


N (C\ •pBern\ 


vn{o^:cc - 


^o) = 






f Op.(l) 




'yBern 


iVp(0,Ef/™), 


where 
























u 


J 

-E 


1 -PJ 






(3.14) 




Y^Bern 




u 


-E 








(3.15) 




Y^Bern 




u 


-E 


1-pj 


PJ 






(3.16) 




^Bern 
^mc 






-E 


1-PJ 
PJ 


^o|,((/ 




(3.17) 




^Bern 

^cc 




U 


J 

-E 


PJ 


^o|,((/ 




(3.18) 


where Qe, Qc 




are defined in Theorem 


3.1. 





(2) Under the same conditions as in Theorem 3.2, the same conclusion in (1) 
holds with Iq and £q replaced by those defined in Condition 3. 7. 

Comparing the variance-covariance matrices in Theorem 3.3 to those in The- 
orems 3.1 and 3.2, we obtain the following corollary comparing designs. All 
estimators have smaller variances under sampling without replacement. 

Corollary 3.1. 

3 = 1 P' 

Se = Ef™-X:^,^{P0|,(/-Qe)4r^ 

jr[ Pj 
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^Bern 



.=1 p= 

J 

®2 



1 = 1 

,=1 Pi 

Variance formulae (3.15), (3.17) and (3.18) have the following alternative rep- 
resentations which show the efficiency gains over the plain WLE under Bernoulli 
sampling. 

Corollary 3.2. Under the same conditions as in Theorem 3.3, 
Ee = E - Var QJo 



^o(^) 



em 



Var(^-^Qj, 



Thus modified calibration and centered calibration yield improved efficiency 
over the plain WLE, while (ordinary) calibration does not yield a guaranteed 
improvement in general. We do not have similar formulas for sampling without 
replacement except for the special case involving within-stratum calibration 
described in part (2) of Corollary 3.3 below. 



3.4-2. Within-stratum adjustment of weights 

[3] proposed calibration within each stratum to improve the calibrated WLE. 
Let = I(y G Vj)Z^ and Z = (Z^^), . . . , Z^"'))^, and consider calibration 
on Z. The calibration equation (2.4) becomes 

!_ ^ ^.^^z,/(y, e V,) ^ 1 E z.m e V,), , = 1, . . . , J, 

where a £ M'^'^. We call this special case within-stratum calibration. We define 
within-stratum modified and centered calibration analogously. 

We also call the method of adjusting weights within-stratum estimated weights 
when binary regression is done within each stratum. Recall that the first J 
elements of Z for estimated weights are stratum membership indicators and the 
rest are other auxiliary variables, say Z^^l. Within-stratum estimated weights 
uses Z = (Z(i), . . . , Z^"'))^ where Z^^) = I{V G Vj){Z^^^)^ with 1 included in 
Zl^l. The "true" parameter ao has zero for all elements except having G^^{pj) 
for the element corresponding to I{V G V^), j = 1, . . . ,J. 
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The following corollary summarizes within-stratum adjustment of weights 
under stratified Bernoulli sampling and sampling without replacement. 

Corollary 3.3. (1) (Bernoulli) Under the same conditions as in Theorem 3.3 
with Z replaced by Z and ao replaced by ao for within-stratum estimated weights, 

Sf" = ^Bern_J2^^^-^Poy(QU)eA''\ (3.I9) 
J = l 



^Bern ^ ^Bern _ ^ ^^.L^p^,^. (gb)| A , (3.2O) 



^Bern ^ ^Bern _ jy^LlIl P^^JqU) (3.21) 

where 

qW/ ^ Pol. [fG.iZ^aoXZ^'Y] {PouGliZ^ao)iZ^'Y'Y' 

xGe{{Z^&„)I{V eV,)Z^^\ 
Qi^V EE Poy[fZ^]{Poy[Z^^]}-'l{V eV,)Z, 

Qg^f EE Poy[f{z - ^,z.,f]{Poy[{z - ^,z,,r']}-'i{v ev,){z - ^,z,J), 

with fiz^j = E[I{V G V-i)Z] /or j = 1, . . . , J. 

(2) (without replacement) Under the same conditions as in Theorem 3.1 or 
Theorem 3.2 with Z is replaced by Z , 

J , _ 

S,, = T.-Y,v, PlVar^^JQi^yio) . (3.22) 

3=1 P' 



3.4.3. Comparisons 

We summarize Corollaries 3.1-3.3. All estimators have reduced variance under 
the sampling without replacement design in comparison to Bernoulli sampling. 
Every method of adjusting weights improves efficiency over the plain WLE in 
a certain design and with a certain range of adjustment of weights (within- 
stratum or "across-strata" adjustment). However, particularly notable among 
all methods is centered calibration. While other methods gain efficiency only 
under stratified Bernoulli sampling, centered calibration improves efficiency over 
the plain WLE under both sampling schemes. There is no known method of 
"across-strata" adjustment that is guaranteed to gain efficiency over the plain 
WLE under stratified sampling without replacement. 

There are close connections among all methods. When the auxiliary variables 
have mean zero, then centered and modified calibration are essentially the same. 
Within-stratum calibration and within-stratum modified calibration give the 
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same asymptotic variance. For Z and ao defined for estimated weights and Z 
and oiQ defined for within-stratum estimated weights, modified cahbration based 
on {l—'KQ{Vy)~^Ge{Z"'"ao)Z and within-stratum cahbration based on Ge{Z'^ a^) 
perform in the same way as the estimated weights and within-stratum estimated 
weights, respectively. Because of these connections among methods, there is 
no single method superior to others in each scenario. Performance depends on 
choice and transformation of auxiliary variables, the true distribution Pqj and 
the design. For our sampling scheme, within-stratum centered calibration is the 
only guaranteed method to gain efficiency while other methods may perform 
even worse than the plain WLE. 

4. Examples 

To prove asymptotic normality of WLE's, consistency and rate of convergence 
need to be established in order to apply our Z-theorems in the previous section. 
To this end, general results on IPW empirical processes discussed in the next 
section will be useful. We illustrate this in the Cox proportional hazards models 
with right censoring and interval censoring under two-phase sampling. 

Let T ^ F be a failure time, and X be a vector of covariates with bounded 
supports in the regression model. The Cox model ([11]) specifies the relationship 

K{t\x) = exp(6i^x)A(t), 

where 6* G C is the regression parameter, A S iJ is the (baseline) cumu- 
lative hazard function. Here the space H for the nuisance parameter A is the 
set of nonnegative, nondecreasing cadlag functions defined on the positive line. 
The true parameters are 6q and Aq. 

In addition to AT, let C/ be a vector of auxiliary variables collected at phase I 
which are correlated with the covariate X . For simplicity of notation, we assume 
that the covariates X arc only observed for the subject sampled at phase II. 
Thus, if some covariates X are available at phase I, then we include an identical 
copy X' of X in the vector of U . 

4.1- Cox model with right censored data 

Under right censoring, we only observe the minimum of the failure time T and 
the censoring time C ^ G. Define the observed time Y = Tf\G and the censoring 
indicator A = I{T < C). The phase I data is y = (y. A, [/), and the observed 
data is (F, A, ^A", J7, ^) where ^ is the sampling indicator. 
We assume the following conditions. 

Condition 4.1. The finite- dimensional parameter space Q is compact and con- 
tains the true parameter 6q as an interior point. 

Condition 4.2. The failure time T and the censoring time C are conditionally 
independent given X , and that there is t > such that P(T > t) > and 

imsart-generic ver. 2011/05/20 file: wlecoxv-arxiv3.tex date: January 19, 2012 



Saegusa and Wellner/ Weighted Likelihood: two-phase sampling 20 

P{C > t) = P{C = t) > 0. Both T and C have continuous conditional densities 
given the covariates X = x. 

Condition 4.3. The covariate X has bounded support. For any measurable 
function h, P{X ^ h{Y)) > 0. 

Let X{t) = {d/dt)A{t) be the baseline hazard function. With complete data, 
the density of {Y, A, X) is 

PeAy,S,x) ^ (^X{y)e'^^-^^y^''"'' {1 - G)iy\x)^ (^-^^^'^''"^^(ylx)) Px{x), 

where px is the marginal density of X and g{-\x) is the conditional density of 
C given X = x. The score for 6 is given by 



ieAy,S,^)^x(^S-eO"-A{y)y 



and the score operator Bg^A ■ T~L ^ L2{Pe.h) is defined on the unit ball T-L in the 
space BV[Q^ r] such that 

Be Ah{y, 6, x) = Sh{y) - e"^"" [ hdA. 

Because the likelihood based on the density above does not yield the MLE with 
complete data, we define the log likelihood for one observation with complete 
data by 

ig^y, S, x) = log I (e""- A{y}) ' e'^^^)^'"^ | = 5K{y} + 56'^ x - e'"-A{y), 

where A{t} is the (point) mass of A at t. Then maximizing the weighted log 
likelihood PJ^^e.A reduces to solving the system of equations P]^^e,A = and 
^JfBg^Ah = for every h & H. The efficient score for 9 with complete data is 
given by 

il^..iyM ^S[x- ^(,)) - e^o^^^ S [x - ^w) dA.it), 
and the efficient information for 9 with complete data is 



where 



i^lMr]-Ee^"'' [[x-^^iy)) {l-G){y\X)dA,{y), 



Mk{s) = Pe„,Ao^''e''o-^/(y > s), fc = 0,1. 



Theorem 4.1 (Consistency). Under Conditions 3.1, 3.2, 4.1-4.3, the WLE's 
are consistent for [9q,Aq). 
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Proof. We only consider the WLE with modified calibration. Proofs for the other 
four estimators are similar. Our proof closely follows the consistency proof for 
the MLE with complete data in [39]. 

Because of the assumption on r, we restrict our attention to the interval 
[0,t]. For a bounded function h € L2(A), define a perturbation dAN,mc.t = 
(l + i/i)dAjv,mc of A7v,mc- The weighted log likelihood with modified calibration, 
P^""^£e_A, evaluated at {9N,mcT AN,mc,t) viewed as a function of t is maximal at 
t = by the definition of the WLE with modified calibration. Thus, differenti- 
ating at i = we obtain P^^'^Bg^ h = 0, or 

JlO,Y] 



Let MNfiis) = Pj^ e''«."-=^I{Y > s). Replacing h in the above display by 
h/MN,o yields 



Similar reasoning via PaBoh = leads to AqH = PQAh{Y) / Mq{Y) . Let 

A^h - p^™^A/i(r)A/o(r). 

Since P{T > t) > and P(C = r) > 0, we have for s < t that Mo{s) > 
Mo(r) > 0. The function {y,S) i-> Sh{y)/Mo{y) is bounded, and therefore 
{5h{y)/Mo{y) : h G 'H} is Glivenko-Cantelli by the Glivenko-Cantelli preserva- 
tion theorem [40] and the fact that H is Glivenko-Cantelli. Thus, ||Ajv||-h -^p* 
\\P9„,A,Ah{Y)/Mo(Y)\\n = \\Ao\\h- Moreover, since AN,rnc{Yi} = Aw,„c(5y, = 
N-'^{^,/^^&^{V^)){Ai/MN,o(Yi)), and similarly 

An{Y,} = N~^i^J7^a,{VM^^/MoiY,)), 

it follows that 

AN,rnc{Y,}/AN{Y,} = Mo{Y,)/MN,oiY^)■ 

Since the weighted log likelihood with modified calibration evaluated at {ON^mc, A^r.^c) 
is larger than at (^Oi Ajv), we have 



< 



iON,rnc - eo fPlj"''AX - P^j"'' (e^«.™c^ A^ (F ) - e^"^AA.(F)) 
+ Pr"Alog ^'^"^^^ . 
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We take the limit of this on N . Because 8 is compact, there is a subsequence 
of {On} that converges to 6*00 G 8. It foUows by Theorem 5.1 that along the 
convergent subsequence of {(?jv} 



For the second term, note that Km{t) is uniformly bounded, because is 
uniformly bounded in 9 and X, and kN{T)f'^j^""'e^«^ liY = r) < ¥'lf'"'e^'^^-^-^ Kn 
P^'"^A < 1. Here we use the weighted likelihood equation with h = 1 above. 
Since {AN,mc} and {A^v} are both subsets of the class of monotone, bounded 
cadlag functions that is Glivenko-Cantelli, it follows by the Glivenko-Cantelli 
preservation theorem [40] and Theorem 5.1 that 



^^^'Ajv(y)-e^o^^Aw(r)) +op.(l), (4.23) 



along a subsequence of 9N,mc- 

For the third term, note that {M^r 0} is a subset of the class of monotone, 
bounded, cadlag functions, which is Glivenko-Cantelli, and hence so is it. Note 
also that MnaIt) = P]^™'^e^"''"=^/(y = t) is bounded away from zero with 
probability tending to 1 since P(T > r) > and P{C = r) > 0. Since MAr^o(i) > 
Mn.q{t) for t < T, the set {(51og(A/o(2/)/Mjv,o(y))} is Glivenko-Cantelli by the 
Glivenko-Cantelli preservation theorem again so that 

P^™"Alog(Mo(r)/A/jv,o(>")) = PeoM^log{MoiY)/MN{Y)) + op,{l) (4.24) 
by Theorem 5.1. 

The set {5/i(2/)/Mjv,o(2/) : e is Glivenko-Cantelli by the Glivenko-Cantelli 

preservation theorem [40] so that ||AAr||^ = [jPe^^Ao A/i(F)/Mjv,o(^)||w + op* (1) 
by Theorem 5.1. Since we have by Theorem 5.1 that 

MnAs) = Flj"'%'-.'-^''l(Y > s) P,„,A„e^^^/(y > s) = M^,o(s) 
uniformly in s, it follows by the dominated convergence theorem that 

IIAatIIw = \\Pe„A,AHY)/MNAY)\\H+op,il) 

^p. ||P0„,A„A/l(y)/Afoo,o(miw= l|Aoo||«, 

along a subsequence oi 9jq. 

Apply the dominated convergence theorem to replace Ajv^mc; A^v, and A/jv,o 
by Aoo, Aq and in (4.23) and (4.24) and conclude 



< 



'co - 0o)^Peo,AoAX - Peo,Ao (e'*^^A^(r) - e'''^-^Ao(r)) 
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Since Mq/Moo ~ dAoo/dAo, (4.25) is in fact minus one times the KuUback- 
Leibler divergence 

for the complete data model. Thus, (4.25) is exactly zero. But since K{Pgg,\g, Pb.a) 
is strictly positive unless (6*, A) = {9o,Ao) by the identifiability of parameters, 
we must have (6'cx3, Aoo) = (6*0, Aq). This is true for any subsequence of 0Ar,mc, 
and the result follows. □ 

We apply our Z-thcorcm (Theorem 3.1) to show the asymptotic normality of 
the WLE's. 

Theorem 4.2 (Asymptotic normality). Under Conditions 3.1, 3.2, 4- 1-4- 3, 

VN{eN - Oo) = VNF^Jeo.Ao + op, (1) N (0, E) , 

^{ON^e - Oo) = ViVP]^"4,A„ + op, (1) -^d N (0, Se) , 

^{dN,c - do) = V7VP]^%„,Ao + op. (1) -^d N (0, , 

VNi§N^rac - Oo) = VN¥^^"'%,^Ao + Op, (1) N (0, £„,) , 

Vn{§n^,, - 60) = %/7VP]^^%„,A„ + op, (1) -^d N (0, E,,) , 

where ieo,Ao = ^eoAo^Oa Aq efficient influence function for 9 with complete 

data, and S, Eg, Ec, Emc and Ecc are given in Theorem 3.1. 

Proof. We verify the conditions of Theorem 3.1. Condition 3.3 holds by Theorem 
4.1. Conditions 3.4 and 3.5 hold under the present hypotheses as was shown in 
[41], section 25.12. □ 

For variance estimation regarding On, In = P/v \ ^* ~, r can be used 

l_ 0n,An J 

to estimate /q. Letting £o = ^^v^^^ ^ , we can estimate Varo|j^o by Pj^f^ — 

P,-£o| where Pjio = VJ^ioHV G V-j) and P/^^ = ¥Jf£f^I{V £ Vj). The 
other four cases are similar. 



4-2. Cox model with interval censored data 

Let y be a censoring time that is assumed to be conditionally independent of a 
failure time T given a covariate vector X. Under the case 1 interval censoring, 
we do not observe T but {Y,A) where A = I{T < Y). The phase I data is 
V = {Y, A, U) and the observed data is (Y, A, ^X, U, £,) where ^ is the sampling 
indicator. 

With complete data, the log likelihood for one observation is given by 
£iO,F) = Jlog|l-P(y)°^P(^^^)} + (l-<5)logF(2/)°^P(''^^) 

= Jlogjl -cxp(-A(y)cxp(6i^a;))} - (1 - (5) exp(6i^a;)A(y) 
= i{O.A), 
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where F, F, and A are related hy F = 1 - F = cxp(-A). The WLE {On, An) 
of (6*, A) maximizes P^^(6', A). 

The score for 6 and the score operator Bg,A for A with complete data are 

ie,A = X exp(0^x)A(y)(,5r(y, x; 0, A) - (1 - <5)), 
Bg^A[h] = cxp{0^x)h{y) {Sr{y, x; 9, A) - (1 - 6)} . 

where 



1 — exp (— exp(6'^a::)A(y)) 
The efficient score for 9 with complete data is given by 



E 



e* . = e °''Q{y,5,x:0n,Kn)Kn{y) { X \ : 

0oM ^vy> , , u, u; uvyy ^ E [e:^<^o^O{Y\X)\Y = y 

where Q{y, 5, x; 9, A) = 6r{y, x; 9, A) - (1 - 6) and 0{y\x) = r{y, x; Oq, Aq). The 
efficient information for 9 with complete data is 



E[XR{Y,X)\Y] \ 
RiY,X)^X- E^j^(^Y,X)\Y] j 



where R{Y,X) = Al(Y\X)0{Y\X). See [15] for further details. 
Wc impose the same assumptions made for complete data in [15]. 

Condition 4.4. The finite- dimensional parameter space Q is compact and con- 
tains the true parameter 9o as its interior point. 

Condition 4.5. (a) The covariate X has bounded support; that is, there exists 
Xq such that \X\ < xq with probability 1. (b) For any 9 ^ 9q, the probability 

Pie^x ^ 9^x) > 0. 

Condition 4.6. Fo{0) = 0. Let tf„ = inf{t : Fo{t) = 1}. The support of Y is 
an interval S[Y] = [ly, Uy], and < ly < uy < r^,,. 

Condition 4.7. The cumulative hazard function Aq has strictly positive deriva- 
tive on S\Y], and the joint function G{y, x) of (Y, X) has bounded second order 
(partial) derivative with respect to y. 



4-2.1. Characterization of the WLE 



We characterize the WLE's before studying their asymptotic properties. Let n = 
Si3=i be the number of observations sampled at phase II. Let Y(i), . . . ,y(n) 
be the order statistics of Yi, . . . , Y/v with = l,i = 1, . . . ,N. Let A(i), Ar(i), 
J7(i), and ^(j) correspond to Y^,); for example, if Y^ij ~ Yj, then A(i) = Aj. Let 
T^ii) = 7ro(V(i)). Because only fully observed subjects contribute to the weighted 
likelihood, Aj^iYi) for subjects with = docs not matter in the maximization. 
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In fact, AAr(Y(j)) = AAr(Y(j_i)) for subjects with ^(j) = for i > 2. The WLE 
Ajv of A corresponds to a; = . . . , A^^)) that maximizes 



log {l - cxp (-e''"^(')) X,} - (1 - A(,)) 



at subject to < < • • • < a^n- The monotonicity constraint on x is 
imposed to guarantee that an estimate of A is nondecreasing. Note that 4>{0,x) 
is concave in x. 

Without loss of generality, we can assume that A(i) = 1 and A(„') = 0. If 
A(i) = or A(„) = 1. then Ajv(Y(i)) = or Ajv(y(„)) = oo, so that the first or 
the last summand in (j) is zero. Hence ignoring these terms does not change the 
maximization of the weighted likelihood. 

Proposition 4.1. Assume that A(i) = 1 and A(„) = 0. Then the WLE 
{9]s[,An) satisfies 

" ^ 0, 



FIAn{Y) exp{9jfX)XQ{Y, A, X, 9^, Ajv(r)) 

^|^Q(Y(,),A(,),X(,);^w,Ajv)exp(^?;X(,)) < 0, fo 

j>i 

FIQ{Y, A, X; 9n, An) exp(0?;X)AAr(y) = 0. 



1,- 



Moreover, the corresponding (in) equalities holds for the WLE's with estimated 
weights and (modified and centered) calibration. 

Proof. The first equation is simply the weighted score equation for 9. 

For the second inequality, let Ij be the vector which has I's as its last j 
components and zeros as its first n — j components. Let A^ ~ (Ajv(Y(j)))"^]^. 
For e > 0, the vector A^ + elj satisfies the monotonicity constraint. It follows 
by the definition of the WLE that 



> hm ■ 

£iO 



^AT, AjY 



eh 



IN 



^Ln) 



i=l 



e 



A 



_ p-e'«'''(')A„(y(i)) 



(1 



A(,))e«"^c) 



Hi > j)- 



Relabeling i and j gives the desired result. Note that the assumption that A 



(1) 



1 and A 



(») 



guarantees that the above derivative is finite. 



The last equality follows for the same reason that 

0(^W,A^ + /lA^)-(/)(^Ar,Ajv) 

lim ; 

/i->o h 



0. 



Note that adding terms associated with = does not contribute to the sum 
in the above derivative. 

For the other four estimators, change weights appropriately. □ 
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4-2.2. Consistency 

We prove consistency of the WLE's in the metric given by 

Ai), (02, A2)) = 11^1 - 62\\ + !|Ai - Aallp^. 2 , 
where for 6 is the EucHdean distance, 

l|Ai-A2||p,,2 = / {Ai{y)-A2{y)fdPY, 

and Py is the marginal probabihty measure of the censoring variable Y. The 
idea of our proof is first to show the consistency in the Kullback-Leibler diver- 
gence. To this end, we use the Glivenko-Cantelli theorem for the IPW empirical 
processes (Theorem 5.1) in Section 5. Then noting that the Kullback-Leibler 
divergence bounds the Hellinger distance, we apply the inequality of Lemma A5 
of [23] which bounds the metric d by the Hellinger distance. 

Theorem 4.3 (Consistency). Under Conditions 3.1, 3.2, 4-4-4-'^, the WLE's 
are consistent in the metric d. 

Proof. We only prove consistency for the WLE. Proofs for the other four es- 
timators are similar. Instead of directly working on H , let H be the set of 
all subdistribution functions defined on [0,00]. We denote the WLE of F as 

Fat = 1 - cxp (-Aat^ . 

Define the set F of functions by 

T= {/(6i,F) = (5(1 -F(y)'=''P(''^^') + (l-(5)F(y)"''P(''''^) ■.eee,FeH\ 



Boundedness of X and compactness of C K.*' imply that the set {ex-p{9'^x) : 
9 E 8} is Glivenko-Cantelli. The set H is also Glivenko-Cantelli since it is a 
subset of the set of bounded monotone functions. Thus, it follows from bound- 
edness of functions in T and the Glivenko-Cantelli preservation theorem ([40]) 
that T is Glivenko-Cantelli. 

Let < a < 1 be a fixed constant. It follows by concavity of the function 
u I— >■ log u and Jensen's inequality that 



< log Po 



f f{e,F) 



where the first equality holds if and only if l-(-a(/(0, F)/ f{do, Fq) — !) is constant 
on S[Y], in other words, {6, F) = {do,Fo) on S[Y] by the idcntifiability condition 
4.5. Note also that by monotonicity of the logarithm 



Pn 



log 1 



f{Oo,Fo) 



> Po [log {1 + a (0 - 1)}] = log(l - a) 
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Thus, the set 



e = log 1 



I{0,F) 



- 1 



has an intcgrablc envelope. To see this, form a sequence {6n,F„) such that 

/ fi^mFn) 



9n 



log 1 



a 



sup lO! 



\f{Oo,Fo) 



1 



V/(^o,^^o) 



1 



G. 



Then {gn — log(l — a)}„gN is a monotone increasing sequence of nonnegative 
functions. By the monotone convergence theorem, 

Pogn - log(l - a) ^ PoG - log(l - a) < - log(l - a). 

Thus we can choose G V — log(l — a) as an integrable envelope. Moreover, the 
set Q is Glivenko-Cantelli by a Glivenko-Cantelli preservation theorem [40]. 

Now, by the concavity of the map u i— > log u, and the definition of the WLE, 
we have 



N 



\0g{l 



fjON, Fn) 

f{Oo,Fo) 



>¥l (l-a)log(l)+alog 



fiON,FN) 



fiOo,Fo) J 
: log fi§N, Fn) - VI log fi0o, Fo)} 



> 0. 



Since Q and H are compact, there exists a subsequence of {9]\;,Fn) that con- 
verges to {9oo,Foo) £ Q X H. Along this subsequence it follows by Theorem 5.1 
that 



< P]V log 



1 



log 1 



f{ON, Fn) 
^ f{Oo,Fo) 



- 1 



Thus, we have 

Peo,Fo log |l + Q 

This is possible only at (0oo,-Foo) 



V fiOo,Fo) 
\ fiOo,F„) 



< 0. 



- 1 



= 0. 



'o,Fo) because {9,F) ^ P[log{l + 
a{f{9,F)/f{6o,FQ) — l)}] attains its maximum only at (6'o,-fo)- Hence conclude 
that {9n,Fn) converges to {9o, Fq) in the sense of KuUback-Lcibler divergence. 
Since the Kullback-Leibler divergence bounds the Hellinger distance, it follows 

□ 



by Lemma A5 of [23] that d (^{9n, An), i^o, -^o)^ =op.(l). 
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4-2.3. Rate of convergence 

We prove the rate of convergence of the WLE is N^^^. We apply the rate theorem 
(Theorem 5.2) in Section 5. Since we proved the consistency of {9n,^n) to 
{00, Aq) on S'[y], under the Condition 4.6 we can restrict a parameter space of 
A to 

HM^{AeH : M-^ <A<M, on S[Y]} , 

where M is a positive constant such that M^^ < Aq < AI on S[Y]. Define 
M = {ii9,A) -.0 ee,Ae Hm}. 

Theorem 4.4 (Rate of convergence). Under Conditions J^.J^-J^.l, 
d ((^AT, An), (00, Ao)) = Op. . 

This holds if we replace the WLE by the WLE's with estimated weights and 
(modified and centered) calibration assuming Conditions 3.1 and 3.2. 

Proof. Since the rate of convergence for the WLE is easier to verify than the 
other four estimators, we only prove the theorem for the WLE with modified 
calibration. The cases for the WLE's with estimated weights and (centered) 
calibration are similar. 

We proceed by verifying the conditions in Theorem 5.2. The bound (5.29) 
follows by Lemma 5.2 in Section 5 and Lemma A5 of [23]. 

For the bound (5.30), we follow the proof of (5.28) in [15]. Since un is con- 
sistent, we can specify the small neighborhood Amc,o of a zero vector such that 
Gmciz; a) is contained in a small interval that contains 1 and consists of strictly 
positive numbers. Thus, multiplying the log likelihood by a uniformly bounded 
quantity, G,„c(z;a) only require a slight modification of Huang's proof of his 
Lemma 3.1 to obtain 

suplogAf[] (e,G7W,i2(Q)) < e"\ 
Q 

for e small enough where the supremum is taken over the all discrete probability 
measures, and GM = {Gmci-\ a)i{0, A) : a e Anc.o, ^(^', A) G M}. Thus, it 
follows by Lemma 3.2.2 of [42] that 

E* \\<GN\\aMs ^ (l + ^^■^) ^ '^^('^)' 
where the set GM.s is 

{m{0, A, a) - m{0Q, Aq, a) : m{0, A, a) E GM,d {{0, A) , (6*0, Aq)) < 6} . 
Apply Theorem 5.2 to conclude tat = N^^^. □ 
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4-2.4- Asymptotic normality of the estimators 

We apply Theorem 3.2 to derive the asymptotic distributions of the WLE's. 
Theorem 4.5 (Asymptotic normaUty). Under Conditions 3-1, 3-2, 4-4-4-'^) 



NiON - ea) = VNFiee,,A, + op. (1) ^ N (0, S) , 

N{eN,e - Oo) = y/NVylg^^Ao + OP' (1) ^ N (0, Se) , 

N{0N,mc - Oo) = V]VP^'4,Ao + op- (1) - N (0, S,) , 
NieN,mc - Oo) = ViVP^™%„,Ao + op- (1) ^ iV (0, , 
N{0N,cc - Oo) = ^F^/%,,A, + op, (1) ^ TV (0, Sec) , 

where ^eo,Ao = ^<r\ a efficient influence function with complete data 

and S, Eg, Sc, Smc 'f^'^ are given in Theorem 3-2- 

Proof- We give a proof for the WLE with modified cahbration by verifying the 
conditions of Theorem 3.2. The cases for the other four estimators arc similar. 

Condition 3.6 is satisfied with /3 = 1/3 by Theorems 4.3 and 4.4. Conditions 
3.7-3.9 are verified by [15] with 

Ko{y)E[XeM'^OlX)0{Y\X)\Y = y] 



E [eyi^{2elX)0{Y\X)\Y 



Since P^'^'^fg : = by Proposition 4.1, it remains to show that 



'^Bg^ [h*] = op*{N ^/^). Let go = h* o Aq^ be the composition 

of h* and the inverse of Ag. Note that Aq is a strictly increasing continuous 
function by our assumption. Since go{AN^rnciy)) is a right continuous function 
and has exactly the same jump points as Ajv,Tnc(y), by characterization of Aj^_mc 
in Proposition 4.1, 

P^"'ffo [^nMY)) e'--^^''Q{Y,A,X;ON,mc,AN,rac) - 0. 

By Conditions 4.5-4.7, h* has bounded derivative. This and the assumption 
that Aq has strictly positive derivative by the Condition 4.7 imply that go has 
bounded derivative, too. So, noting that h* = go o Ao, we have 

= ¥1^"^%* (r)e^«-^^Q(r, A, X; ^jv,™c, An.^c) 

= ¥y {go o Ao{Y) - go{AN,mc(Y))} e«"-=-^Q(F, A, X; ^w,mc, Aw,„J 

= (P]^"" - Pso.Ao) {go o Ao{Y) - go{AN,mc{Y))} e^«-^^g(y, A,X;^A,,™„ Ajv.™c) 

+ POoAo {go O Ao{Y) - go{AN,rnc{Y))} e^«.-^Q(r, A, A:;^JV,mc, Ajv.rnc). 
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[15] showed that the second term in the display is op* 

(iV-i/2), show that 

the first term in the display is also op*{N^^/'^). Let C > be an arbitrary 
constant. Define for a fixed constant 77 > 

2?(r;) = {V(y, X- e,A):d {{0, A) , (0o, Aq)) < A e Hm} , 

where S, x; 6, A) = {gooAo{y)-go{A{y))}e^^''Q{y, S, x; 9, A). Because Huang 
(1996) showed that ^'(t?) is Donsker for every ry > and that ||GAr|jp(p^-i/3) = 
Op. (1), it follows by Lemma 5.4 with J'n replaced by V{CN~^^^) that ||(G^"^||x,( 
op*(l). This completes the proof. □ 

Unlike the previous example, £g depends on additional unknown functions, 
and the method used in the previous example docs not work to estimate asymp- 
totic variances in the present case. See the discussion in Section 6. 

5. General results for IPW empirical processes 

The IPW empirical measure and IPW empirical process inherit important prop- 
erties from the empirical measure and empirical process, respectively. We em- 
phasize the similarity between empirical processes and IPW empirical processes. 

5.1. Glivenko-Cantelli theorem 

The next theorem states that the Glivenko-Cantelli property for complete data 
is preserved under two-phase sampling. 

Theorem 5.1. Suppose that J- is PQ-Glivenko-Cantelli. Then 

W^N-PoWr^P'^ (5-26) 

where \\-\\jr is the supremum norm. This also holds if we replace P]^ hy P^*^, 
P^*^, P^v'"'^ or P]^^*^, assuming Conditions 3.1 and 3.2. 

5.2. Rate of convergence 

The rate of convergence of an M -estimator with complete data is often estab- 
lished via maximal inequalities for the empirical processes. If wc follow the same 
line of reasoning, it is natural to derive the maximal inequalities for IPW empir- 
ical processes, though this may require some efforts. Fortunately, these maximal 
inequalities for empirical processes (or slight modifications of them) suffice to 
establish the same rate of convergence under two-phase sampling. 

Theorem 5.2. Let M = {me : 9 G 0} be the set of criterion functions and 
define AAg = \^tiq — tuq^ : d(9, 9q) < 6} for some fixed 5 > where d is a 
semimetric on the parameter space Q. 

(1) Suppose that for every 9 in a neighborhood of 9q, 

Poime-mg,)<-d^9,9o); (5.27) 
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here a < b means a < Kb for some constant K G (0, cxj). Assume that there 
exists a Junction (p^ such that 5 t-^ (f)N{6)/5°' is decreasing for some a < 2 (not 
depending on N ) and for every N , 



E*\\G 



N\\Ms 



< 



(5.28) 



where Gn is the empirical process. If the WLEOjq satisfying VJ^rUg^ > FJfmeg — 
Op*{r^) converges in outer probability to 6q, then rj^diOjsi ,6q) = Op*{l) for 
every seguence r^ such that rfjcfj^i^/'i^N) < for every N. 

(2) Suppose the Condition 3.2 holds. Suppose also that for every 6 Cz O in a 
neighborhood ofOo, 

|2 



tlN,# — 



-^^*me,-Op,{r-^^) 



Po{Gm.c{V]a){mg - me,)} < -d"(0,^?o) + \a - ao|". (5.29) 
Assume that 

E*\\Gn\\gMs <^NiS), (5.30) 
where GMs = {Gmc{''iCt)f ■ l^l < S,a e A^^f E Ms} for some Ajy C 
Amc- Then the WLE with modified calibration, 9N,mc, satisfying P^"^^'mg^ > 
F'^™''^mgg — OptirJ^) has the same rate of convergence as the WLE in part (1) 
if it is consistent. 

(3) Suppose the Conditions 3.1 and 3.2 hold. Under the same conditions of (2) 
with Gmc replaced by ttq / G^, for # G {e, c, cc} the same conclusions hold for the 
WLE with estimated weights, On,#, satisfying V'^j^'^" 
with # S {e,c, cc} respectively. 

Remark 5.1. The key to establishing a general theorem for the rate of conver- 
gence is to make use of the boundedness of the weights in the IPW empirical 
process and also deal with the dependence of the weights. In treating independent 
bootstrap weights in the weighted bootstrap [21], Lemmas 1-3, require the bound- 
edness of bootstrap weights, because the product of an unbounded weight and a 
bounded function is no longer bounded. Our theorem exploits the boundedness 
of sampling indicators in the IPW empirical processes by applying a multiplier 
inequality for the case of bounded weights (Lemma 5.1) to cover more general 
cases. 

The following is a multiplier inequality for bounded exchangeable weights. 
Note that the sum of stochastic processes in the second term is divided by n^/^ 
rather than k^^^. 



Lemma 5.1. For i.i.d. stochastic processes Zi, 
able random vector . ■ ■ ,S,n) with each 
Zi, . . . , Zn, and any 1 < uq < n. 



. . . ,Zn, every bounded, exchange- 
€ [l,u\ that is independent of 



E 



< 



1 " 
/n ^-^ 

4=1 

2(^0 -l) y^^^ 



\jrE max — ^ + 2{u — I) 

l<i<n s/n 



max 



E 



1 ^ 



imsart-generic ver. 2011/05/20 file: wlecoxv-arxiv3.tex date: January 19, 2012 



Saegusa and Wellner/ Weighted Likelihood: two-phase sampling 



32 



The bound (5.30) is not difficult to verify in tlie presence of the bound (5.28) 
since Gmc{'', ol) is a bounded monotone function indexed by a finite dimensional 
parameter. The bound (5.29) may be verified through the lemma below for some 
applications including the Cox model with current status data. 

Lemma 5.2. Suppose Conditions 3.1 and 3.2 hold. Let mg be the log likelihood 
\ogpe where pe is the density with dominating measure /i, and d is the Hellinger 
distance. Then the hound (5.29) and the corresponding hounds for the WLE's 
with estimated weights and (centered) calibration hold. 



5.3. Donsker theorem 

The next theorem yields weak convergence of the IPW empirical processes under 
sampling without replacement. 

Theorem 5.3. Suppose that T with ||Po||j'^ < oo is Po-Donsker and Conditions 
3.1 and 3.2 hold. Then, 



J 



GJ^ '^G'' = G + J2V^\ - — —'^1^ (5-31) 

V Pj 



J 



.1 7r,e 

'N 



iTT.C 



Ev^J^Tr^'^^(--^?'=--), (5.32) 



Pj 



5^yi7- /L^G,(.-Q,.), (5.33) 
V Pj 



i=i 

J 



V Pj 



i=i 
J 



G^- ^ G-.- ^ G + ^ ^ /L^G,(. - Qcc), (5.35) 
j=i V 

in £°°{J-) where the Po-Brownian bridge process, G, indexed by J- and the Po\j- 
Brownian bridge processes, Gj, indexed by T are all independent. 

Remark 5.2. The integrability hypothesis j|Po||j^ < oo is only required for the 
IPW empirical processes with estimated weights and (modified and centered) 
calibration. 

If the index set J- is Donsker, then it follows by the previous theorem and 
Lemma 2.3.11 of [42] that asymptotic equicontinuity in probability and in mean 
follows for the metric that depends on the limit process. In applications, it is of 
interest to have these results for the original metric ppg (/, g) = ap„ (/ — g). 

Theorem 5.4. Let J- be Donsker and define J's ^ {f ^ g '■ f, 9 J'l PPoifyd) < 
S} for some fixed S > 0. Then, for every sequence Sn 10, 
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N 



^5 



op.(l) /or # e 



and consequently, ||G]^||jr_j ~op*{\). Moreover, G 

{e,c,mc,cc} assuming Conditions 3.1 and 3.2. 

We end this section with two important lemmas. The first lemma is an ex- 
tension of Lemma 3.3.5 of [42] and will be used in our proof of Theorem 3.1 to 
verify asymptotic equicontinuity. 

Lemma 5.3. Suppose J- = {'ipe,h — i^Oo.h ■ \\0 — ^o|| < S,h Cz H} is P^-Donsker 
for some 6 > Q and that sup^g-^ Po(V'e,/i ~ V'So-'O^ Q, as ^ Oq. If On 
converges in outer probability to Oq, then 



= Op.(l). 



This also holds if we replace GJf by 
ditions 3.1 and 3.2. hold and \\Po\\j^ < oo. 



1 TT.e 



N 



G 



N 



assuming Con- 



The second lemma is used to verify asymptotic equicontinuity in the proof 
of Theorem 3.2, the first part for the IPW empirical process and the second 
part for the other four IPW empirical processes with estimated weights and 
(modified and centered) calibration. 

Lemma 5.4. Let be a sequence of decreasing classes of functions such that 
II'^wIIj'^'iv — op*(l). Assume that there exists an integrable envelope for T^-, for 
some Nq. Then E\\Gn\\j='n as N oo. As a consequence, ||G-^||jr^ = 

Op.(l). 

Suppose, moreover, that is Po-Glivenko-Cantelli with \\Po\\j^^ < oo for 
some Ni, and that every f = /jv G J^n converges to zero either pointwise 
or in ii(Po) as iV ^ oo. Then HG^^Hjc-^ = op.(l), = op.(l), 



\\'^'n""^\\j^n ~ op*(l) and ||G]^'^'^|| ~ op*(l), assuming Conditions 3.1 an 
3.2. 



6. Discussion 



We developed asymptotic theory for weighted likelihood estimation under strat- 
ified sampling without replacement. To deal with difficulties due to the de- 
pendence of observations, we established general results for the IPW empirical 
processes. These results and some methods of proof in this paper may be appli- 
cable to other estimation procedures. For instance, the weighted Kaplan- Meier 
estimator can be shown to be asymptotically Gaussian by the functional delta 
method and the Donsker theorem for the IPW empirical processes [33] . Another 
application is the weighted estimating equations approach. Beyond these rather 
obvious applications, it is interesting to study the asymptotic behavior of esti- 
mators proposed under independence assumptions that are not involved with 
inverse probability weighting. In particular, whether or not some known efficient 
estimators under Bernoulli sampling such as those proposed by [25] is "efficient" 
under our sampling scheme is an open problem. (See [22] for the definition of 
efficiency with non i.i.d. data.) 
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There are several other open problems. [24] developed a method for comput- 
ing the observed information corresponding to the finite dimensional parameter 
in the context of the profile likelihood with complete data. This method is par- 
ticularly useful when the asymptotic variance is not given as a closed formula 
nor an expectation of a known function. One of examples they considered is in 
fact the Cox model with interval censoring. Their method is not directly ap- 
plicable to two-phase sampling, and a general method of variance estimation 
will be useful. Another open problem is to study other complex survey designs. 
Stratified sampling without replacement is suHiciently simple for the existing 
bootstrap empirical process theory to apply. Other complex designs may pro- 
vide interesting theoretical challenges, perhaps in connection with bootstrap 
empirical process theory. In Section 4, we followed [41] in imposing somewhat 
stronger conditions than necessary. Those assumptions allow us to stay within 
(IPW) empirical process theory, but removing unnecessary conditions raises 
the question of sorting out connections between empirical process theory and 
martingale theory as used for example in [1] to study Cox's partial likelihood es- 
timators with complete data. Empirical process methods will undoubtedly need 
to be used for further study of the behavior of the WLE methods under model 
miss-specification. 

7. Appendix 

Wc repeatedly use the notation for empirical measures and processes introduced 
in Section 2 following [6]. The fundamental idea of [6] is to view as the 

exchangeably weighted bootstrap empirical process corresponding to 'Gj.Nj = 
~ Po\j) for i = 1, • ■ • , J- The processes j^, converge weakly to 
^/pj(T^^])j)Gj for independent Po|j-Brownian bridge processes = 1, . . . , J, 
in £°°{T) for Donsker classes J-'. 

Asymptotic linearity and the limiting distributions oi An in binary regression 
and (modified and centered) calibration are given by the following proposition. 
The proof requires a Glivenko-Cantelli theorem for FJf whose proof is indepen- 
dent of Proposition 7.1. 

Proposition 7.1. Under the Condition 3.1, djv is consistent for a^, and 




- MV^)) + 0*p{l) 




where Gj are independent PQ\j-Brownian bridge processes. 
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Under the Condition 3.2, both a'j^j , a^'^ and are consistent, and 
VN{a% - ao) 

4=1 



1 - , 



Pj 



ijZ, 



Op,(l) 



iV(ar - ao) 

N 



Op*(l) 



1 - , 



and 



Nia^r^ - ao) 
-G(0)-i 



N 



^ 2 — 1 



^o(X^.) 



-G(0)-MPoi^^(^-Mz) 
7ro(F) 



op*(l) 
-1 J 



where the PQ^j-Brownian bridge processes, Gj, are independent. 

Proof. Wc first consider estimated weights. Define A/jv(a) = PjvWq and M{a) = 
PaoTUa where ma{Z,£_) = \og{{pa{^\Z) +pao{^\Z)}/'2) . We again apply The- 
orem 5.7 of [41] for a consistency proof. Because Pa{^\Z) is a valid marginal 
density of a single observation ^ given Z, the argument of [41], page 66, can 
be used to verify the second condition of the theorem. We verify the first con- 
dition of Theorem 5.7 of [41]. Let Geiz]a) = {Ge(z^a) + Geiz^ao)}/2. Then 
777-0(2, ^) — £^\ogGe{z] a) + {1 ~ ^) log(l — Ge{z] a)). We rewrite PjvTTia as 

1 ^ 

Vntu^ = -J2^,logGe{Zf, a) + {l^i,)log(^l -Ge{Zf, a)] 



1 

1 N 



1 ^ 

— T 



6 



^ ^logG.(Z,..;a) 



i=i 



AT 



AT. 
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Thus, if we establish that both Sq,j = |log {l - Ge{z'^a)^ : a e R''+'',V e V^j 

and = |logG'e(z^a) : a <E V £ V^j are Po-Ghvcnko-Cantelh for j = 

1,...,J, it fohows from Theorem 5.1 apphed to sampled subjects and non- 
sampled subjects in each stratum separately that PtvTOq converges in probability 
to 

J 

Pom„ = ^i/jp,Po(logGe(Z^a)|ve Vj 
J 

+ ^ ,^,{1 - p,)Po (log (l - Ge(Z^a)) \veV, 

uniformly in a. Note that the method of estimated weights does not estimate the 
sampling probability for the subjects in a stratum if the sampling probability 
is 1. Thus, we can assume that Ge{Z'^ao) < a' < 1. Hence we have log(cr/2) < 

logGe(Z'^a) < and log({l - cr'}/2) < log (l - Ge(Z^a)) < for all j = 

1, . . . , J and a G M'^+'^. This implies that all sets Sk.j, fc = 0, 1, have integrable 
envelopes. Now it suffices to show that all sets are VC subgraph classes. Note 
first that {z'^a : a G R'^+'^'j is a VC subgraph class by Lemma 2.6.15 of [42]. 
Note also that Ge and the logarithm are monotone functions. Because a map 
by a monotone function, addition and multiplication all preserve the property 
of the VC subgraph class by Lemma 2.6.17 of [42], our claim follows and hence 
the first condition is verified. Since we have by concavity of the logarithm and 
the property of djv that 



ipArlogp^„(e|y) + il 
1„ . 1. 



MNi&N) > T;^Nlogp&A^\V) + :^^Nlogpo.^{^\V) 



> -VNlogPcom) + -FNlogPc^m) = Afjv(«o), 

consistency follows from Theorem 5.7 of [41]. 

We apply Theorem 3.3.1 of [42] to show asymptotic normality oi An- Define 



and 



Note that ^N,e{(^N) = because {d/da)Fj\[\ogpa — ^N,e{ct)- Note also that 
$e(ao) = since Ge{Z'^ao) = pj when V € Vj. It follows by the decomposition 
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(10) of the inverse probability weighted empirical processes in [6] that 



N{^N,e{ao) - $e(ao)) = ViV$Ar,e(ao) 



N 



N¥ 



G,{ZTa^){l-Ge{ZTao)) 

N 



(C-Ge(^' ao)) 



iVP 



AT 



G,{Z^ao)Z 
\-Ge{Z^aa) 



^ V iV J" G,{ZTao) 1 - Ge(Z^ao) 



+\/iVP. 



N 



- 1 



Ge(Z^ao)^ 



GeiZ^ao) Jl-GeiZ^ao)' 



Since 7ro(F) = rij/Nj and Ge(Z-^Q;o) — when V E Vj, the first term converges 
to 



1 V Pji^-Pj) 



GjGeiZ^ao)Z. 



The second term can be written as 



Pj 



1 1 



AT. 



Y,Ge{Zl,ao)Z,, 



Since Uj = [A'jPj] by assumption, it is easy to see that — iV^ ' < ^jNj{nj/N, 
Pj) < 0, and hence y^{nj/Nj - pj) 0. Since N-^Y.f^^Ge{Zj,afi)Z, 



Op. (1) by the weak law of large numbers and \/Nj/N ^/v], the second term 
converges to zero in probability. The weak convergence of ■\/]V($Ar g — $e)(ao) 
follows from Slutsky's theorem. 

For asymptotic equicontinuity of the process, it suffices to consider a compact 
subset AeS) S M'^^*' where olq is its interior point since ajv is consistent. Let 



<^Q,l(") 



</'a,2(u) 



'Kq(v)z 



02 



/ 



Ge(^^a){l-Ge(2^a)} 
/ 



Ge(^^a) 



V 



{Ge(^^a)y 
Ge(z^a) 



y®2 



1 - Ge(z^a) 
Taylor's theorem gives 



G,{77a) 



\ 



{Ge(z^a)}^ 
1 - Ge(z^a) 



7ro(w) 
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where a*,j ~ 1, 2, arc some convex combinations of a and a^. Thus, 



= {VI - Po)<t>o.i.iVN{a ~ ao) + (Pjv - Po)<^a^2VA^(a - ao) 



(7.36) 



To show this is op. (1 + vA^(q: — ao))^ we first show that the set Tk = {4'a,k ■ ol € 
^e,o}, k = 1,2, are Ghvenko-Cantelh. It is easy to see that {z^a : a € Ae,o} is 



GUvenko-Cantelh. Since Ge € C by assumption, 



k = 1,2, are uniformly 



bounded in a € Ae,o- Thus, the sets Tk, k = 1,2 are both Ghvenko-Cantelh by 
the Glivenko-Cantelli preservation theorem [40]. For the third term in (7.36), 
apply the dominated convergence theorem with Po(CI^) = Ej=i("i/^j)^(^ & 

Since $(ao) = —Sq, apply Theorem 3.3.1 of [42] to obtain 
\/7V(q!jv — ao) 



3 = 1 



jGeiZ'^ao)Z. 



This completes the proof. 

Next wc consider modified calibration with 



a 



N 



The cases for (cen- 



tered) calibration (i.e., = a^r and un = c^n) similar. Define ^N,mc{ct) = 
VJfG,nc{V; a)Z^VNZ and $™c(a) = Po[{Gmc{V; a)-l)Z\. Note that $iv,mc(a7v) 
and 'l'mc(O) = 0. We apply Theorem 5.7 of [41] for a consistency proof. For 
the first condition of the theorem, we have 



sup ||$Ar,mc(Q!) - $mc(a) 



sup 



< sup 



G„,,(F; a) - 1 - Po {G,nc{V\ a) - 1} Z 



- sup 

a6R'= 



-y 

iV^^o(V^^) 
1 ^ 

-yz,- 



G:n,,{V,;a)Z,-P^G 

nic (; ct)z 



PoZ 
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where j|-j| is the EucUdcan norm. Since a is a vector in R*"' and G is monotone, 
{Gmc{-\a) : a g R*^} is a VC subgraph by Lemmas 2.6.15 and 2.6.18 of [42]. 
Boundedness of G imphes that the set \^Gmc{v]a)z : a G R'^} is Po-GUvenko- 
Cantelh by the Ghvenko-Cantehi preservation theorem [40]. Then the first term 
is Op. (1) by Theorem 5.1. The second term is op. (1) by the weak law of large 
numbers. 

The second condition of the theorem is that for any e > 0, inf i^^i^^ ||$„jc(a)|| > 
0. Suppose, to the contrary, that inf|c(|>e |j<f>mc(Q;)|| ~ for some e > 0. Then 
there exists a sequence {a'-'"^} C R*"' with |a'-™'| > e for each m = 1,2,..., such 
that 

||$™c(a('"))|| ^ 0. 

Let $j^c(a), j = 1, . . . , fc, be the jth element of $mc(a). Since the norm || -[I is the 
Euclidean norm, each element $j^c(Q;''™^) converges to zero. If a'™^ converges 
to q;'-°°^ with |q;*-°°-'| < oo, then by the dominated convergence theorem and 
Taylor's theorem. 



= Pn 



[g^,{V- a(°°)) - l} ^] = ^0 [{MV)-^ - l)G,nc{V; a*)Z- 



for some a* with |a*| < |a(°°'|. Because Po(7ro(^)"^ - l)G„,c{V]a*)Z^'^ is 
positive definite by assumption, a^°°^ must be zero, which contradicts the fact 
that > e. 

We assume that some elements of a'-™-' diverge. Then, a further subsequence 
Q,("i ) converges to some a^°°^ whose elements are extended real numbers. Define 
a unit vector /3(°°) = lim,„'_>oo a'™ Vll"^''™ ''II- Then we have for each Z on the 
set {7ro(F) < 1} that 

m'-i-oo \ 7ro(\/) ||a II / 
MiZ^/3(°°) if Z^P^"^^ > 

if Z"^/?'"") = 

It follows by the dominated convergence theorem applied to each element of the 
vector of $mc(«) that 

0= lim $,„cf«^™'^)^/3^°"^ =Po lini \G^Jv;a^"'">) - l] Z^f3^^'> 

+ (toi - l)fo/{2T^(^)<o^^„(y)<i}Z'^/3'°°^ 

However, this is strictly positive since mi < 1 and Mi > 1, which is a contra- 
diction. This completes the proof that d^r — >p. 0. 
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Wc apply Theorem 3.3.1 of [42] to show the asymptotic normaUty of ajv- For 
asymptotic equicontimiity condition, it foUows by Taylor's theorem that 

= Gl[Gr,^c{V- aN)Z - G,nc{V- aQ)Z] 

= (P]V - Po){MVr^ - l)G„^c{V■, a*)Z'^^^/N{a - ao) 

for some a* with \a* — ao\ < I&m — oio\- This term is op(l + y/N\a — ao)|) if 
(Pj;f-Po)(7ro(F)-i-l)Z®2G'„c(V;a) ^p* 0, miiformly in a. Let A«c,i C R'' be 
a compact neighborhood of zero. Since ajv is consistent, it suffices to show that 
the set {(-Kq^V) - l)Z®'^G„^ciZ;a) : a G Amc^} is Glivenko-Cantelh. Since 
ko'^(^) ~ H ^ ^^^"^ bounded, the VC subgraph class {{t'o^{V) — l)Za : 
a £ Amc.i} (Lemma 2.6.15 of [42]) is Po-Glivenko-Cantelli. Because G is con- 
tinuous and bounded; the set {Gmc{Z; a) : a € Amc,i} is Glivenko-Cantelli by 
the Glivenko-Cantelli preservation theorem of [40]. Apply the Glivenko-Cantelli 
preservation theorem of [40] again to conclude {{n^^ {V)-l)Z'^'^G„,c{Z; a) : a G 
Amc,i} is Glivenko-Cantelli. Hence, asymptotic equicontinuity follows from The- 



orem 5.1. We show the weak convergence of the process v Af($Ar.rnc — ^mc)(Q;) 
at aQ ~ 0. Since Gmci''^', o^q) = 1, it follows from the decomposition (10) of the 
inverse probability weighted empirical processes in [6] that 




)(ao) = V^$iv,™c(0) = VN{Flf - Fn)Z 



( by Theorem 5.3). 



The Frechet derivative of $,„c(Q!o) is 



*rnc(a)|a= 



^PoiGmc{V;a) 
oa 



l)Z 



G{Q)PMVr 



1)Z^ 



Thus, by Theorem 3.3.1 of [42] we obtain 



Here we give proofs of the theorems in Section 5. 




□ 



Proof of Theorem 5.1. First consider P]^. By the decomposition (10) of the in- 
verse probability weighted empirical processes in [6], we have 



N 



-P 



Oil 



< 



TV, TV, 
^-^ N U j 



^^■•^^ TV, 



J" 
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The first term is op. (1) since F is Glivenko-Cantelli. Since {Nj/N){Nj/nj) — >p* 
Vj/pj, each summand in the second term is op. (1) by the bootstrap Ghvenko- 
Cantelh theorem, which is an easy corollary to Lemma 3.6.16 of [42]. 

Consider V^'^. Because An -^p* aa by Proposition 7.1, it suffices to consider 
a compact neighborhood K C R'^^*^ of ao- Since Z is bounded and Ge is contin- 
uous, {T'a{V)}~^ = {Ge{a'^ Z)}~^ is bounded in this neighborhood. Because a 
is a vector in and Ge is monotone, {{Ge{a)}~^ : a G K} is a VC subgraph 

class by Lemmas 2.6.15 and 2.6.18 of [42]. Boundedness of Ge implies that the 
set 

{7ro{Gei-a)}-'f : f e:F,aeK} 

is Po-Glivenko-Cantelli by the Glivenko-Cantelli preservation theorem [40]. Since 
— ^-p* Q!0: we have by (5.26) that 



■ N 



>p- 0, by recognizing that 



.TT,e _ J_ f 7ro( 



Consider P]^""^. The cases for F'^'^ and V^'^'^ are similar. We verified in the 
proof of Proposition 7.1 that {Gmc{-',ce) ■ a G M'"'} is a VC subgraph class. 
Boundedness of G implies that the set 

{Gmei-;a)f : f eT,aeR''} 

is Po-Glivenko-Cantelli by the Glivenko-Cantelli preservation theorem [40]. Since 
aN converges to zero in probability by Proposition 7.1, the result follows by 
(5.26). 

□ 

Several lemmas are required for the proof of Theorem 5.2. 

Lemma 7.1. Let T be a class of functions with Po\f\ < oo for every f € T. 
Then, 



E* 



'-I{N, > 0)G,- AT^. 



<E*\\Gn\\^, for each j = l,...,J. 



J" 



Proof. Let e^, i = 1,...,A^, be independent Rademacher variables, indepen- 
dent oi Xi,i = 1, . . . , iV, and Nj. It follows from the symmetrization inequality 
(Lemma 2.3.6) of [42] 



E*\\Gn\\t > E* 



1 ^ 
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Rewrite this and use Jensen's inequality again with E[ef{X)\ = to obtain 



E* 



> E* 



I{N, > 0) 



N 



1=1 

TV, 



1 



Here we imphcitly change the law. This can be justified by Proposition A.l of 
[6]. 

Now applying the Lemma 2.3.6 of [42] to the j'th stratum, this is further 
bounded below, up to some constant, by 



E* 



AT, 



3 i=l 



= E* 



IiN,>0),/N~/NG,^N, 



J" 



□ 



The following is a multiplier inequality for bounded exchangeable weights. 
Note that the sum of stochastic processes in the second term is divided by n^/^ 
rather than A:^/^. 

Proof of Lemma 5.1. This follows the proof of Lemma 3.6.7 of [42] up to the 
last line. Since the ^i's can be split into their positive and negative parts, we 
only consider the case where they are nonnegative. Thus for any 1 <na <n, 



E 



< E 



no — 1 



< E max fi 



+ E 

J' 

no - 1 



i"=no 

Y,E*\\Z,\\^ + E 



J" 



where = 1, . . . ,ri,, are the reverse order statistics of = 1, . . . ,n. To 

bound the second term, we substitute ^(jj = X]fc=i(^(fc) ~^(fc+i)) with ^(„+i) = 0, 
and change the order of summation to obtain 



E 



= E 



E 



2— no k=i 



k=nQ i— no 



J" 



imsart-generic ver. 2011/05/20 file: wlecoxv-arxiv3.tex date: January 19, 2012 



Saegusa and Wellner/ Weighted Likelihood: two-phase sampling 



43 



It follows from the triangle inequality and the independence of the ^'s and Z^'s 
that this is bounded by 



J" 



k — 7lo 



k—7iQ 



i=no 
k 



< E W)) max E* 

^ — ^ no<k<n 



k—no 



^ no<k<n 



k—no 



< {u — I) max 

no 



E^ 
E^- 



using the boundedness of the ^i's in the last line. The proof for the negative 
parts of the ^^'s is similar and the inequality follows. 

Lemma 7.2. For an arbitrary set J- of integrable functions, 



□ 



E*\\<Gl\\^<E* 



TAT 



Proof. We decompose as in (2.1); thus 



E*\\Giy = E* 



< E* 



G 



N 



'nWjt + YE* 



N \n, 



It therefore suffices to show that each E* ||mjGj.Ar^ ||^ is bounded up to some 
constant by ||Gjvi|^ where = {NjlNfl'^{N-lnj). 



Rewrite &■ ^, as 



1 



3 1=1 



Now we condition on N_ = {Ni, . . . , Nj), and write Ej^ for E{-\N_). Since ^j^i G 
{0, 1}, it follows by the multiplier inequality of Lemma 5.1 applied conditionally 
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with TiQ — 1 and Zi — 'mj{Sxj i — ^j,Nj) that EN_\\mjGjj^,\\jr is bounded by 



(1 — 0) max En_ 



k 



■J l—l 



max En 

l<k<Ni — 



k 

E 



Note that Nj/nj < a^^ for some ct > by assumption so that we can replace 
Nj/rij by in the last display to obtain an upper bound. Then, apply the 
triangle inequality to further bound this by 



max E*f^ 

l<k<Ni — 



max Etf 

l<k<Ni — 



J" 



^\m,N,~Poy)\\^ 



Since Sxj ; — Po\j has mean zero, it follows by Jensen's inequality that the first 
term is bounded by 



E 



N 



1 



* 




771* 











J" 



The second term is bounded by E 



N 



. Now compute uncondi- 



tionally and apply Lemma 7.1 to find that both terms are bounded by E* 



□ 



Proof of Theorem 5.2. It follows by Lemma 7.2 and the assumption on E* \\Qn\\ms 
that 

By application of Theorem 3.2.5 of [42], we conclude that the conclusion of (1) 
of the theorem holds. 

For the second statement, note that Theorem 3.2 of [24] holds in a general 
setting where Po"T-e,i) and PnTig^,, are replaced by the deterministic function 
M(0, 77) and the stochastic process M„(6',7y), respectively. Our parameters a 
and 9 play roles of their 9 and 77, respectively. Our choice of M and M^r is 
PoGmc{V;a)mg and F'^"^'^mg. The condition 5.29 corresponds to (3.5) of [24]. 
The condition 5.30 together with Lemma 7.2 verifies their (3.6). Apply their 
Theorem 3.2 to obtain d{9N,mc,9o) < Op'{5^^ + \aN - ao|) = Op>{6^^). The 
cases for 9j^^, 9f^ ^ and ^at.cc are similar. □ 



Proof of Lemma 5.2. We consider modified calibration. Other three cases are 
similar. Because G(0) = 1 and Z is bounded, consistency of d^v implies that 
there exists Amc,2 C Amc such that for some fixed constant C > 0, Gmc{v] a) > 
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C and Gmc{v]Oi) > C for every a S Amc,2 and P{aN G Amc,2) ~> 1- Then, for 
arbitrary a G Amc,2, 



PoGmc{V] a){mg - me„) = PoG,nc{V; a) \o, 
^ -1 



Pe 



Gmc{v;a) -Pgf^)^ +P9 -poo^dfj. 

^ i {py^ - Pei^fd^i+ {G,nciv;a) ~l}{pe~Pe„)dfJ. 



^ -Gh^{pe,P9o) + J Grnc{v;a*){TrQ^{v) ~ l)v'^{pe - Pdo)dnia - ao), 

where a* is some convex combination of a and ao- Because the integral in 
the last display is a bounded row vector, the second term in the last display is 
bounded by ja— aop up to some constant. Thus, the condition (5.29) holds. □ 

The following lemma is useful when showing asymptotic equicontinuity of 
processes involving P^"^, P^'^, P^;™*^ and ^'^'^'^ 



■ N 



■ N 



Lemma 7.3. Suppose Conditions 3.2 and 3.1 hold. Let T he a Glivenko-Cantelli 
class. Then 



sup 



N{¥n - Po) 



= op.(l), (7.37) 



where iTa,^ is either an estimated or calibrated probability (with modified or cen- 
tered calibration). 

Proof. We only consider modified calibration. The cases for estimated weights 
and (centered) calibration are similar. It follows by Taylor's theorem that 



sup 



^/N{FN - Po) 



sup 



:n-Po){ i7^o\y) - l)Z^G,nc{Z;a*)f] ^/N\aN - ao|, 



for some a* with \a* — ao| £ \otN — ao|- Because y/Ni&N — chq) = Op«(l) by 
Proposition 7.1, it follows that (7.37) is op. (1) by Theorem 5.1 and Proposition 
7.1 if the set {iMV)~^ ~ l)^^G{(^o"'(^) " l)^^a} : « e Amc,3, f e T} is Pq- 
Glivenko-Cantelli where Amc,3 C Amc is some compact set containing ao ~ 0. 
This is easily verified in the same way as in the proof of Proposition 7.1. □ 

Proof of Theorem 5.3. The result (5.31) follows from [6]. Consider the IPW em- 
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pirical process with modified calibration. It follows by Taylor's theorem that 

G^"7 - Glf 



"N 



"N 



/ 



f 



+ Po TUV^Z' GmciV; a*)f VN{aN ~ «o), (7.38) 

where a* is some convex combination of and ao- The first term is op*(l) 
by Lemma 7.3. Since {tto{V)~^ — l)Z^Gjnc is bounded and / is intcgrable, it 
follows from the dominated convergence theorem that 



Apply the result (5.31) and Proposition 7.1 to conclude the finite-dimensional 
convergence 

J 



/I — TTnll/l „• \. .1 I — TTnll/l „„l 



52 



J 



J 



j = l V 3 = 1 V 



J 



Next, we prove asymptotic equicontinuity of G]^'"*^ with respect to the metric 



pmc defined by 

1-77- 

pL(/, 9) = ^o(/ - .g)' + E ^Varo|,(/ - g). 

3=1 

First recall that GJ^ is asymptotically equicontinuous with respect to the metric 
p defined by 



pHLa) = ^Uf + '^.^^Varo|,(/ - g). 

3 = 1 
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The part crp^(/ — g) corresponds to the empirical process G^v = Vn{¥n — 
Pq) in the decomposition (2.1) of the inverse probabihty weighted empirical 
processes. However, this empirical process Gat is asymptotically equicontinuous 
with respect to the L2(^)-inetric with an assumption ||/oI1j^ < oo in view of 
Problem 2.1.2 of [42]. Thus, G]^ is asymptotically equicontinuous with respect 
to pmc- Now, it remains to verify the asymptotic equicontinuity of G^™'^ — G^. 
Let /lAT € Fsj, = {f - g : f,g € J^,pmcif,g) < Sn} for an arbitrary sequence 
Sn i 0. In view of (7.38) 

(G^™= - G^^)hN = op, (1) + Po (^^^^^Z^GUV; a*)hN] Op, (1), 

where a* is some convex combination of and ao- Because each element of 
a vector (7ro(T^)~^ — l)Z^Gmc{V; a*) is bounded, it follows from the Cauchy- 
Schwarz inequality that each element of Po{{tto{V)~^ — l)Z^Griic{V; Q!*)/ijv} is 
bounded up to some constant by Po(/i^). Since Pmcifig) ^ implies -Po(/ ~ 
gY — >■ 0, we have P^h?^ — >■ as iV — )- oo. This verifies the asymptotic equiconti- 
nuity of G^'"'^ and hence completes showing its weak convergence. 

The cases for G^*^, G^'^ and G^^^ follow analogously. □ 

Proof of Theorem 5.4- Since T is Donsker, it follows by Lemma 2.3.11 of [42] 
that E*\\Gjn\\ts^ ^ for every sequence 5n 4- 0- Thus, the result follows from 
Lemma 7.2. Apply Markov's inequality to obtain ||G]^||_p^ = op*(l). For the 
second statement, consider the expansion (7.38) of G^™'^/ — G]^/ with / € J^Sn- 
The first term is op» (1) by Lemma 7.3. Since / converges to zero in £2(^0)1 the 
second term is op. (1) by the dominated convergence theorem and Proposition 
7.1. Apply the triangle inequality to conclude ||G]y™'^||jp.^ = op»(l). 

The proofs for Gj^"^, G^'^ and GJy^'^ are similar. □ 

Proof of Lemma 5. 3. Without loss of generality, assume that takes its values 
in 85 = {9 € Q : \\9 ~ 6q\\ < 5} because of consistency of 6m to 6*0. Define a 
function / : £°°{Q5 x H) x Qs ^ ^°°('H) by f{z,e)h ^ z{e,h). Note that / is 
continuous at every point (z, 9o) such that \\z{9, h) — z(0o, h)\\-H — 0, as — >• Oq. 
To see this, suppose z_/v — >■ z and 9n — >■ ^o- Then, for a fixed e > 0, there exists 
no such that ||zAr — z|| < e and \\9n — 6*0!! < e for > A^q. For A^ > A^o, we have 

\\f{zN.9N)- f{z,9^)\\H 

< \\f{zN,9N) ^ f{zo,9N)\\H + \\f{zo,9N) - fizo,9o)\\ 

< sup \zNi9,h)- z{9,h)\ + \\z{9N,h)~ z{9o,h)\\n 

< 2e. 

Define a stochastic process Zjv indexed by Qg x "H by 

ZAr(0, h) = G]V {ipgji - iJe„Ji) ■ 

Because {i^e,h — 4'eo,h \\9 — 9f)\\ < 5,9 ^ Q,h ^ %} is Donsker, Theorem 5.3 im- 
plies that the sequence "Ln converges in £°°{<ds x H) to a tight Gaussian process 
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1 given by 

J 



This process has eontinuous sample paths with respect to the semimetric p given 

by 

{{9 1, hi), {62,112)) = P(V^ei,hi -i'OoM -V'02./>2 +^'60,^2)^ 

because (65 xH^p) is totally bounded and Z is uniformly p-continuous. To see 
the latter, note that 



((6'i, hi), {02, h2)) > P\^{ipeuhi - -ipeoM ~ i^02M + i^OoM^ ^ ^ "^j} ^3 

for each j = 1, . . . , J. By assumption 

sup {{6, h), {9o,h)) = sup P {^s^h - ^so.h + of ^ 0, 
he-H hen 

as ^ 00- Thus, / is eontinuous at almost all sample paths of Z. 

By Slutsky's theorem, {Zn,0n) {Z,9o). By the continuous mapping theo- 
rem, Zn{9n) = f{ZN, 0n)^ m, 9o) = in e^{n). 

The other cases for G^*^, G^*^, G]^™*^ and Gjj'^'^ follow analogously; see the 
proof of Theorem 5.3. □ 

With the results of Section 5 in hand, we are ready to prove the main theo- 
rems. 

Proof of Theorem 3.1. The asymptotic distributions of 0n is derived in [6]. Here 
we derive the asymptotic distribution of 0N.mc that is a solution of the calibrated 
weighted likelihood equations with modified calibration 

-^l,i,mc[S, V, a) = nGrnc{V; a)ie,rj = 0, 

*5^,2,™c(^, V, a)h = ¥lG^c{V; a){Be,r,h - Pe^r,Bg,^h) = 0, 

for all h eH with a = a^. Let *,„c(6', rj, a) = {'^i^rnc{0, "n, OL),^2,mc{9, r], a)) 

*i,mc(6',?7,a) = PoGmc{V;a)ig^r„ 

'^2,7nc{0, a) = PoGmc{V; a){Bg^,jh - Pe,jjBe^r,h). 

The derivative map of 'i'mc with respect to (6*, 77) at {9o,rio,a) has components 
Po{Gmc{V;a)iji.j^eo.7,o.h},iJ = 1,2. 

Our proof proceed by verifying the conditions of Theorem 1 of [7] . The weak 
convergence of VN{'^N,j,mc — j,mc){0o,VOi ao) follows from Theorem 5.3. The 
asymptotic equicontinuity conditions 



sup 



^^(*3Vj,mc - '^j,mc){9, r], an) - VAr(^'- - V, ao) 



= Op. 

H 
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for j = 1,2, follows from Lemma 7.3. The other asymptotic equicontmuity 
condition 



■H 



for j = 1, 2, follows from the Condition 3.4 and Lemma 5.3. Thus conditions (2) 
and (3) of [7] are satisfied. 

The Frechet differentiability of the map (0, 77) t-j- ^j^mcid, Q^) uniformly over 
the neighborhood of ao follows by the Condition 3.5 and boundedness of G; 



*mc(6', ?7, a)h - *,„c(6'o, ?7o, a)h - ^,nc{{e, rj) - (6*0, 770)) 



H 



sup 



E 



hen 



E^ipe.-n.h - ip9o,no-h - ''Pea. voAi^iV) - {^o,Vq))} 

= op. iWie, 7^) ~ieo,m)\\)- 

The Frechet derivative \l/a,mc of the map a {^'mc(^', : h e "H} is 



1/2 



Now proceed in the same way as [7] to obtain 

= \^{0N -0q) + E 



iV(dAr - ao) + op.(l). 



Because VN{0n-0o) = Gyeo,^o+op. (1) ((16) of [6]), it follows from (7.38) and 
consistency and asymptotic normality of Sat that y/N {9N,mc—0o) = ^JAr""^^So,))o + 
op-(l). Apply Theorem 5.3 to complete the proof. 

The other three cases are similar. □ 

Lemma 7.4. Let Zi,Z2,... be i.i.d. stochastic processes indexed by J-m with 
E*\\'Li\\jr^ uniformly bounded in N . Suppose that ||§Ar||j^jv 
= =op. (1). Then 

Proof. Fix e > 0. Let be independent copies of and define Tyv = Y^^=i 
and Uat = TAr - Sat. Since ||Uiv||^« = op. (1), limsup^ P(||UAr||^„ > x^) < 
limsupjY ^'(||Ujv|| j^jv ^ 2;) = by the portmanteau theorem. This implies that 
there exists A^o such that for N > Nq 



P*(||UAr||^„ >xVN) < e/x- 
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Since Uat is a sum of independent symmetric processes, we can apply Levy's 
inequality to obtain 

P* ( max ||Z, - Y,||^„ > x^/n] < 2P*(||UAr||^„ > xVn) < 2e/x^ 

\l<i<n 1 

In view of Problem 2.3.2 of [42], for every N >Nq, 

a;27VP*(||Zi - Yi||^„ > x^Fn) < 4e. 

Note that on the event that j|Zi||jr„ > x, we have 

f3N{x) = Py(||Yi||^„ < x/2) < P^dlZi - Yi||^„ > x/2). 

Integrating both sides with respect to Z gives 

/3jv(.t)F*(||Zi||^„ >x)< P*(||Zi - Yi|l^„ > x/2). 

By Markov's inequality, 

/3jv(.t) = 1 - P*(||Yi||^„ > x/2) > 1 - 2.T-ip||Yi||^„ 

Since £'||Yi||jrj^ is uniformly bounded in N, it follows that, for x sufficiently 
large, (3n{x)^^ is uniformly bounded in N and, therefore, P*(||Zi||jr„ > x\fN) 
is bounded by P*(j|Zi — YiH^r^^ > x^/N) up to some constant for every N. 
Hence this proves that P*(|jZi||jFj^ > a;) = o{x^'^). 

Now we apply the Hoffmann- J0rgensen inequality to obtain 

E*\\^N\\r. < E* niax|iZ,||^„ + G],\u) 

i<N 

for an absolute constant u where 

G^(t) = P*(||§^|b„ <i). 

Since P*(||Zi||jf-„ > x) = o(x^^), £■* maxi<jv||Z.i||jr„ ^ in view of Prob- 
lem 2.3.3 of [42]. The second term goes to zero since ||§Ar||jr„ = op*(l). This 
completes the proof. □ 

Proof of Lemma 5.4- Define Qn = {N^^^^f : f £ J^n}- We apply Lemma 7.4 
with Zi and J^jv in Lemma 7.4 replaced by — Pq and Q ^ , respectively. The 
uniform boundcdncss condition of Lemma 7.4 is satisfied, because i?*||^jci — 
-PqIIj^jv < oo for > Nq, and this expectation is decreasing in > iVg. Thus, 
E*\\<Gn\Wm = E*\\Y.f^^{5x, - P)\\gM 0. Apply Lemma 7.2, and Markov's 
inequality to obtain ||G-]^||j^jv ^ op. (1). 

For the IPW process with modified calibration, consider the expansion (7.38) 
of (G]^™'^ — G'^)f. Then the first term is op. (1) by Lemma 7.3. Suppose that 
f = fN & J-N converges to zero pointwise. Since {no{V)~^ ~l)ZGmc is bounded, 
the second term in the expansion (7.38) is op* (1) by the dominated convergence 
theorem and Proposition 7.1. Suppose instead that / = /tv G J^n converges to 



imsart-generic ver. 2011/05/20 file: wlecoxv-arxiv3.tex date: January 19, 2012 



Saegusa and Wellner/ Weighted Likelihood: two-phase sampling 



51 



zero in Li{Pq). Then the same conclusion that the second term in the expan- 
sion (7.38) is Op. (1) foUows directly. Apply the triangle inequality to conclude 

The proofs for G]^"^, G^"^ and Gj^^*^ are similar. □ 

Proof of Theorem 3.2. We only consider the WLE with modified calibration, 
9N,mc- The other four cases are similar. 

We evaluate the stochastic order of -\/]VP7v""^^fo,i;o + y^Po^g^^ Be- 
N ™c,r)« ,r^c ~ {N"'^/'^) by assumption and Poleojn< = 0' ha,ve 



Let (^Tv i be arbitrary and define J-^ = {^e,r) — ^0o,r;o ■ l^'~^'ol ^ ^n, ||?? — ?yoll < 
N~^}. Then / e J-n converges to zero either pointwise pointwise or in Li(Po) 
by Condition 3.8 as — >■ oo. Moreover, it follows from Condition 3.8 that 
||GAr||j^„ = Op. (1) and that there exists some Nq that is Glivenko-Cantelli 
for N > Nq. Apply Lemma 5.4 to obtain ||G^™^||jr„ = op.(l) and conclude 

ViVP;;;"^4,,,„ + ViVPo4„,„.,^„,„. = op-(l)- Similarly, VAP^^^^S^o.^o + 
^fNPf)Bfj^ [h*] = op.(l). These stochastic orders and Condition 3.9 im- 

ply that 

-Po {-(■eo,mX^^Q,rio{^N ,mc ^ Oq) + Beo.no[VN,mc - '?o])| 

+ O (l^AT,™, - 0o|) + O (||r)Ar,,„, - TyoD + P]v"'^eo.r,o 
= -Fb{-4o.))o(^ei,,77o(^JV,rnc - ^'o) + Bg^^rioiVN ,7nc ~ %]) " ^ § n .^aSlN .,^c + ^»o,r,o} 

+ O (\eN^rr.c - ^Ol) + O {\\f^N,rac - + ^'o^^,^^,^^,^^ + P^v^'^.^o 

= op.(A^-i/2), (7.39) 
and, furthermore, that 

+ O {\eN.mc - 0O|) + O {\\ffN,nc - %ir) + P]^""5eo,r,o [h*] 

= op,{N-^/^). (7.40) 



By Condition 3.6 and ajS > 1/2, VNOp* {\\fiN - VoW") = op.(l). So by 
Condition 3.7 and taking the difference of (7.39) and (7.40), we have 



-P 



op{N-^l^)~op{N-^'^), 



or 



'oiON^mc - Oo) = P]^'"" (4v.o - + Op.(A^-l/2). 
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It follows by the invertibility of /q that 



Op(l). 



Now, we recognize that the summand inside 
function for 9 and apply Theorem 5.3. 



j)7r,mc 
N 



is the efficient influence 
□ 



Proof of Theorem 3.3. Theorem 3.1 for cases for Of/^™ and 6*^*^™ are proved in 

[6, 7]. We only consider the WLE with modified calibration, 9N,mc. The other 
four estimators for both theorems are similar. 

Under stratified Bernoulli sampling, independence of sampling indicators al- 
lows us to proceed in the same as in the proofs of Theorems 3.1 and 3.2 to 

9 Bern 
N.mc 

an in Proposition 7.1. In view of (7.38), //V(^f'=™ 



conclude A/iV(6'^f™ - 6*0) = ViVP^™'=4 + op.(l) and asymptotic linearity of 



where 



'N,mc 



Apply the central limit theorem and compute 



iVPjv/ + op. (1) 



(7.41) 



= Var E 



= Var(/) 



C - ^o(V^) 



Var 



4- 



MV) 
^ ~ MV) 
MV) 



Qmc^O 



x,v 



Var(£o) + E 



Var 



MV) 



{I - Q,nc)i0 



E 



i-Mv) 
MV) 



{{I- 



=)4}^' 

?Tnc)-^o}®^- 



□ 

Proof of Corollary 3.2. We only consider the WLE with modified calibration, 
9N,mc- The other two cases are similar. 

Let Q^iJo = AZ where A = A1A2 with Ai = Po[(7ro'^(V^) - l)ioZ'^] and 
A2 = {Fo[(7ro"^(y)-l)Z®2]}-i. Recall that S^-^™ = Var{(^/7ro(y))^}. In view 
of (7.41), it suffices to show that Cov{{(/MV))io, i^/MV) - V)AZ} is equal 
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to Var((^/7ro(l^) - l)AZ). This is true since 



Gov 

= E 
= E 



toZE 



A' 



= A^A^AJ 



and 



Var 

= AE 



AZ = AY&Y 



Var 



-AVar ZE 



A^ 



Z\A^ 



= AE 



Z' 



+ = A^A-,Al. 



□ 



Proof of Corollary 3.3. (1). We first consider stratified BernoulU sampling. The 
case for O^.c was proved in [3]. We only consider the WLE with modified cal- 
ibration, Opf^rnc- The other two cases, (3.19) and (3.21) corresponding to 9]\[^e 
and 9n.cc, arc similar. 

For Z EE (Z(i),...,Z("'))^ with = I{V e Vj)Z^, we compute ii = 
Po[{jTo\V) - l)^~o^^Land Aj = {Po[{tto\V) - l)Z®2]}-i. Note that Qmcio = 
A1A2Z. The matrix Ai = [^1,1, . . . , is a partitioned matrix where 



3X fc 



and the matrix A2 is the block diagonal matrix the jth block of which is 



A^ 



2,J 



VjPa\j 



Pj 



okxk 



Thus, the matrix A = A1A2 is a partitioned matrix A = \A\^ . . . , Aj\ where 

A, = = Fob- (4^^) {i^ob^^'}"' . 

It follows by the definition of the Z'^' 's that 



P 



'ob [h - AjzY^ = Poy {{I - Q['^)eo} 



«i2 
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Since 

02 



and 

Pol, (ioZ^) Aj = Pol, (ioZ^) {PoyZ'^'}-' Poy {loZ^f , 

it follows that 

Substitution of this into (3.17) gives (3.20). 

(2). Next, we consider the second part of Corollary 3.3 concerning stratified 
sampling without replacement. For Z = ^Z*^), . . . , Z^"'))^ with Z*^^ = I(y S 
Vj)Z'^, we compute Pi = Po[(7ro-^ (V) - l)^o(^-Aiz)^] and B2 = {Po[(V^(l^)- 
l)(Z-/i^)®2]}-i. Note that QcJo = B1B2Z and Hz = (a^I,!, • ■ • , Mi,j)^. The 
matrix Bi = [Pi,i, . . . , Pi,j] is a partitioned matrix where 

and the matrix B2 is the block diagonal matrix the jth block of which is 

Thus, the matrix B = P1P2 is a partitioned matrix B = [Pi, . . . , Bj] where 

P, = Pi,,P2,, = Pob {f^Z - ^^z,,f) {PouiZ - Mz,,)®'}"' . 
It follows by the definition of Z*^^' 's that 

Varoi, {(/ - Qcc)eo} = Varo,, {lo - B{Z - ^^)} 

= Varoi, [io ~ B,{Z - a^z,,)} = Varo,, {(/ - Q^^^)h] ■ 

Then, since 

Varoij (b,{Z - /^z,,)) - P, Varo,, (Z)Pj 

= Pol, (io{Z - ^iz.,f) {Varo|,(Z)}"' Pq,, (j^iZ - Mz,,)^)^ , 



and 



Covoi, (io,B,{Z - ^izj)) = Pou {^o{Z - ^izjf) Bj 
= Pol, (4(Z - fiz.jf) {Varo|,(Z)}-' Po,, (loiZ - fiz.jf)' 
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it follows that 

Varoij {(/ - Qi^^Ko} - Varoij (^) - Varo|,{gi^V~o}. 
Substitution of this last identity into (3.12) gives (3.22). □ 
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